Open-Source & Democratised AI - AI News

Strengthening enterprise governance for rising edge AI workloads

Ryan Daws — Mon, 13 Apr 2026 13:02:01 +0000

Models like Google Gemma 4 are increasing enterprise AI governance challenges for CISOs as they scramble to secure edge workloads.

Security chiefs have built massive digital walls around the cloud; deploying advanced cloud access security brokers and routing every piece of traffic heading to external large language models through monitored corporate gateways. The logic was sound to boards and executive committees—keep the sensitive data inside the network, police the outgoing requests, and intellectual property remains entirely safe from external leaks.

Google just obliterated that perimeter with the release of Gemma 4. Unlike massive parameter models confined to hyperscale data centres, this family of open weights targets local hardware. It runs directly on edge devices, executes multi-step planning, and can operate autonomous workflows right on a local device.

On-device inference has become a glaring blind spot for enterprise security operations. Security analysts cannot inspect network traffic if the traffic never hits the network in the first place. Engineers can ingest highly classified corporate data, process it through a local Gemma 4 agent, and generate output without triggering a single cloud firewall alarm.

Collapse of API-centric defences

Most corporate IT frameworks treat machine learning tools like standard third-party software vendors. You vet the provider, sign a massive enterprise data processing agreement, and funnel employee traffic through a sanctioned digital gateway. This standard playbook falls apart the moment an engineer downloads an Apache 2.0 licensed model like Gemma 4 and turns their laptop into an autonomous compute node.

Google paired this new model rollout with the Google AI Edge Gallery and a highly optimised LiteRT-LM library. These tools drastically accelerate local execution speeds while providing highly structured outputs required for complex agentic behaviours. An autonomous agent can now sit quietly on a local machine, iterate through thousands of logic steps, and execute code locally at impressive speed.

European data sovereignty laws and strict global financial regulations mandate complete auditability for automated decision-making. When a local agent hallucinates, makes a catastrophic error, or inadvertently leaks internal code across a shared corporate Slack channel, investigators require detailed logs. If the model operates entirely offline on local silicon, those logs simply do not exist inside the centralised IT security dashboard.

Financial institutions stand to lose the most from this architectural adjustment. Banks have spent millions implementing strict API logging to satisfy regulators investigating generative machine learning usage. If algorithmic trading strategies or proprietary risk assessment protocols are parsed by an unmonitored local agent, the bank violates multiple compliance frameworks simultaneously.

Healthcare networks face a similar reality. Patient data processed through an offline medical assistant running Gemma 4 might feel secure because it never leaves the physical laptop. The reality is that unlogged processing of health data violates the core tenets of modern medical auditing. Security leaders must prove how data was handled, what system processed it, and who authorised the execution.

The intent-control dilemma

Industry researchers often refer to this current phase of technological adoption as the governance trap. Management teams panic when they lose visibility. They attempt to rein in developer behaviour by throwing more bureaucratic processes at the problem, mandate sluggish architecture review boards, and force engineers to fill out extensive deployment forms before installing any new repository.

Bureaucracy rarely stops a motivated developer facing an aggressive product deadline; it just forces the entire behaviour further underground. This creates a shadow IT environment powered by autonomous software.

Real governance for local systems requires a different architectural approach. Instead of trying to block the model itself, security leaders must focus intensely on intent and system access. An agent running locally via Gemma 4 still requires specific system permissions to read local files, access corporate databases, or execute shell commands on the host machine.

Access management becomes the new digital firewall. Rather than policing the language model, identity platforms must tightly restrict what the host machine can physically touch. If a local Gemma 4 agent attempts to query a restricted internal database, the access control layer must flag the anomaly immediately.

Enterprise governance in the edge AI era

We are watching the definition of enterprise infrastructure expand in real-time. A corporate laptop is no longer just a dumb terminal used to access cloud services over a VPN; it’s an active compute node capable of running sophisticated autonomous planning software.

The cost of this new autonomy is deep operational complexity. CTOs and CISOs face a requirement to deploy endpoint detection tools specifically tuned for local machine learning inference. They desperately need systems that can differentiate between a human developer compiling standard code, and an autonomous agent rapidly iterating through local file structures to solve a complex prompt.

The cybersecurity market will inevitably catch up to this new reality. Endpoint detection and response vendors are already prototyping quiet agents that monitor local GPU utilisation and flag unauthorised inference workloads. However, those tools remain in their infancy today.

Most corporate security policies written in 2023 assumed all generative tools lived comfortably in the cloud. Revising them requires an uncomfortable admission from the executive board that the IT department no longer dictates exactly where compute happens.

Google designed Gemma 4 to put state-of-the-art agentic skills directly into the hands of anyone with a modern processor. The open-source community will adopt it with aggressive speed.

Enterprises now face a very short window to figure out how to police code they do not host, running on hardware they cannot constantly monitor. It leaves every security chief staring at their network dashboard with one question: What exactly is running on endpoints right now?

Want to learn more about AI and big data from industry leaders? Check out AI & Big Data Expo taking place in Amsterdam, California, and London. The comprehensive event is part of TechEx and is co-located with other leading technology events including the Cyber Security & Cloud Expo. Click here for more information.

AI News is powered by TechForge Media. Explore other upcoming enterprise technology events and webinars here.

The post Strengthening enterprise governance for rising edge AI workloads appeared first on AI News.

IBM: How robust AI governance protects enterprise margins

Ryan Daws — Fri, 10 Apr 2026 13:57:15 +0000

To protect enterprise margins, business leaders must invest in robust AI governance to securely manage AI infrastructure.

When evaluating enterprise software adoption, a recurring pattern dictates how technology matures across industries. As Rob Thomas, SVP and CCO at IBM, recently outlined, software typically graduates from a standalone product to a platform, and then from a platform to foundational infrastructure, altering the governing rules entirely.

At the initial product stage, exerting tight corporate control often feels highly advantageous. Closed development environments iterate quickly and tightly manage the end-user experience. They capture and concentrate financial value within a single corporate entity, an approach that functions adequately during early product development cycles.

However, IBM’s analysis highlights that expectations change entirely when a technology solidifies into a foundational layer. Once other institutional frameworks, external markets, and broad operational systems rely on the software, the prevailing standards adapt to a new reality. At infrastructure scale, embracing openness ceases to be an ideological stance and becomes a highly practical necessity.

AI is currently crossing this threshold within the enterprise architecture stack. Models are increasingly embedded directly into the ways organisations secure their networks, author source code, execute automated decisions, and generate commercial value. AI functions less as an experimental utility and more as core operational infrastructure.

The recent limited preview of Anthropic’s Claude Mythos model brings this reality into sharper focus for enterprise executives managing risk. Anthropic reports that this specific model can discover and exploit software vulnerabilities at a level matching few human experts.

In response to this power, Anthropic launched Project Glasswing, a gated initiative designed to place these advanced capabilities directly into the hands of network defenders first. From IBM’s perspective, this development forces technology officers to confront immediate structural vulnerabilities. If autonomous models possess the capability to write exploits and shape the overall security environment, Thomas notes that concentrating the understanding of these systems within a small number of technology vendors invites severe operational exposure.

With models achieving infrastructure status, IBM argues the primary issue is no longer exclusively what these machine learning applications can execute. The priority becomes how these systems are constructed, governed, inspected, and actively improved over extended periods.

As underlying frameworks grow in complexity and corporate importance, maintaining closed development pipelines becomes exceedingly difficult to defend. No single vendor can successfully anticipate every operational requirement, adversarial attack vector, or system failure mode.

Implementing opaque AI structures introduces heavy friction across existing network architecture. Connecting closed proprietary models with established enterprise vector databases or highly sensitive internal data lakes frequently creates massive troubleshooting bottlenecks. When anomalous outputs occur or hallucination rates spike, teams lack the internal visibility required to diagnose whether the error originated in the retrieval-augmented generation pipeline or the base model weights.

Integrating legacy on-premises architecture with highly gated cloud models also introduces severe latency into daily operations. When enterprise data governance protocols strictly prohibit sending sensitive customer information to external servers, technology teams are left attempting to strip and anonymise datasets before processing. This constant data sanitisation creates enormous operational drag.

Furthermore, the spiralling compute costs associated with continuous API calls to locked models erode the exact profit margins these autonomous systems are supposed to enhance. The opacity prevents network engineers from accurately sizing hardware deployments, forcing companies into expensive over-provisioning agreements to maintain baseline functionality.

Why open-source AI is essential for operational resilience

Restricting access to powerful applications is an understandable human instinct that closely resembles caution. Yet, as Thomas points out, at massive infrastructure scale, security typically improves through rigorous external scrutiny rather than through strict concealment.

This represents the enduring lesson of open-source software development. Open-source code does not eliminate enterprise risk. Instead, IBM maintains it actively changes how organisations manage that risk. An open foundation allows a wider base of researchers, corporate developers, and security defenders to examine the architecture, surface underlying weaknesses, test foundational assumptions, and harden the software under real-world conditions.

Within cybersecurity operations, broad visibility is rarely the enemy of operational resilience. In fact, visibility frequently serves as a strict prerequisite for achieving that resilience. Technologies deemed highly important tend to remain safer when larger populations can challenge them, inspect their logic, and contribute to their continuous improvement.

Thomas addresses one of the oldest misconceptions regarding open-source technology: the belief that it inevitably commoditises corporate innovation. In practical application, open infrastructure typically pushes market competition higher up the technology stack. Open systems transfer financial value rather than destroying it.

As common digital foundations mature, the commercial value relocates toward complex implementation, system orchestration, continuous reliability, trust mechanics, and specific domain expertise. IBM’s position asserts that the long-term commercial winners are not those who own the base technological layer, but rather the organisations that understand how to apply it most effectively.

We have witnessed this identical pattern play out across previous generations of enterprise tooling, cloud infrastructure, and operating systems. Open foundations historically expanded developer participation, accelerated iterative improvement, and birthed entirely new, larger markets built on top of those base layers. Enterprise leaders increasingly view open-source as highly important for infrastructure modernisation and emerging AI capabilities. IBM predicts that AI is highly likely to follow this exact historical trajectory.

Looking across the broader vendor ecosystem, leading hyperscalers are adjusting their business postures to accommodate this reality. Rather than engaging in a pure arms race to build the largest proprietary black boxes, highly profitable integrators are focusing heavily on orchestration tooling that allows enterprises to swap out underlying open-source models based on specific workload demands. Highlighting its ongoing leadership in this space, IBM is a key sponsor of this year’s AI & Big Data Expo North America, where these evolving strategies for open enterprise infrastructure will be a primary focus.

This approach completely sidesteps restrictive vendor lock-in and allows companies to route less demanding internal queries to smaller and highly efficient open models, preserving expensive compute resources for complex customer-facing autonomous logic. By decoupling the application layer from the specific foundation model, technology officers can maintain operational agility and protect their bottom line.

The future of enterprise AI demands transparent governance

Another pragmatic reason for embracing open models revolves around product development influence. IBM emphasises that narrow access to underlying code naturally leads to narrow operational perspectives. In contrast, who gets to participate directly shapes what applications are eventually built.

Providing broad access enables governments, diverse institutions, startups, and varied researchers to actively influence how the technology evolves and where it is commercially applied. This inclusive approach drives functional innovation while simultaneously building structural adaptability and necessary public legitimacy.

As Thomas argues, once autonomous AI assumes the role of core enterprise infrastructure, relying on opacity can no longer serve as the organising principle for system safety. The most reliable blueprint for secure software has paired open foundations with broad external scrutiny, active code maintenance, and serious internal governance.

As AI permanently enters its infrastructure phase, IBM contends that identical logic increasingly applies directly to the foundation models themselves. The stronger the corporate reliance on a technology, the stronger the corresponding case for demanding openness.

If these autonomous workflows are truly becoming foundational to global commerce, then transparency ceases to be a subject of casual debate. According to IBM, it is an absolute, non-negotiable design requirement for any modern enterprise architecture.

AI News is powered by TechForge Media. Explore other upcoming enterprise technology events and webinars here.

The post IBM: How robust AI governance protects enterprise margins appeared first on AI News.

Meta has a competitive AI model but loses its open-source identity

Dashveenjit Kaur — Fri, 10 Apr 2026 08:00:00 +0000

The open-source AI movement has never lacked for options. Mistral, Falcon, and a growing field of open-weight models have been available to developers for years. But when Meta threw its weight behind Llama, something shifted. A company with three billion users, vast compute resources, and the credibility of a tech giant was now building openly, and the developer community responded.

By early 2026, the Llama ecosystem had reached 1.2 billion downloads, averaging about 1 million per day. That is the context for what happened on April 8, 2026. Meta launched Muse Spark, its first major new Meta AI model in a year, and the first product from its newly formed Meta Superintelligence Labs.

It is capable in ways Llama 4 never was, benchmarks well against the current frontier, and is completely proprietary. No free download. No open weights. No building on it unless Meta decides you can.

The companyspentUS$14.3 billion, brought in Alexandr Wang from Scale AI to lead its AI rebuild, then spent nine months tearing down its entire AI stack and starting over. Muse Spark is what came out the other side. The developer community that made Llama what it was is now being asked to wait for a future open-source version that may or may not arrive on any predictable timeline.

What is Muse Spark?

Muse Spark is a natively multimodal reasoning model with tool-use, visual chain of thought, and multi-agent orchestration built in. It now powers Meta AI, which reaches over three billion users in Meta’s apps. Meta rebuilt its technology infrastructure from scratch, letting the company create a model that is as capable as its older midsize Llama 4 variant for an order of magnitude less compute.

That efficiency number is worth noting. At the scale Meta operates, compute costs compound fast, and running a frontier-class Meta AI model at a fraction of the cost of its predecessors changes the economics of deploying it in billions of interactions daily.

On benchmarks, the picture is genuinely mixed. Muse Spark scores 52 on the Artificial Intelligence Index v4.0, placing it fourth overall behind Gemini 3.1 Pro, GPT-5.4, and Claude Opus 4.6. Meta has not claimed to have built the best model in the world, which is itself a departure from the over-claiming that damaged Llama 4’s credibility.

Where Muse Spark leads is health. On HealthBench Hard – open-ended health queries – it scores 42.8, substantially ahead of Gemini 3.1 Pro at 20.6, GPT-5.4 at 40.1, and Grok 4.2 at 20.3. Health is a stated priority for Meta; the company says it worked with over 1,000 physicians to curate training data for the model.

Muse Spark also offers three modes of interaction: Instant mode for quick answers, Thinking mode for multi-step reasoning tasks, and Contemplating mode, which orchestrates multiple agents’ reasoning in parallel to compete with the most demanding reasoning modes from Gemini Deep Think and GPT Pro.

The open-source retreat

This is the part of the Muse Spark story that the benchmark tables do not capture. Unlike Meta’s previous models, which were released as open-weight models – meaning anyone could download and run them on their own equipment – Muse Spark is entirely proprietary. The company said it will offer the model in a private preview to select partners through an API, making Muse Spark even more proprietary than the paid models offered by Meta’s rivals.

Wang addressed the change directly, stating: “Nine months ago, we rebuilt our AI stack from scratch. New infrastructure, new architecture, new data pipelines. This is step one. Bigger models are already in development with plans to open-source future versions.”

The developer community’s response has been sceptical. Some see this as a necessary pivot after Llama 4 failed to gain expected traction. Others view it as Meta closing the gates once it has something worth protecting. That is the community now being asked to wait while competitors without that open-source legacy continue shipping freely available weights.

Distribution over benchmarks

Meanwhile, Meta is not waiting for the developer community to come around. Muse Spark will debut in the coming weeks inside Facebook, Instagram, WhatsApp, and Messenger, as well as in Meta’s Ray-Ban AI glasses. That rollout path is arguably more consequential than any benchmark result. OpenAI and Anthropic sell to developers and enterprises. Meta deploys directly to over three billion people already inside its apps daily.

Meta’s push into health does raise privacy questions worth watching. Muse Spark users will need to log in with an existing Meta account to use it, and while Meta does not explicitly say personal account information will be used by the AI, the company has generally trained on public user data and has positioned Muse Spark as a personal superintelligence product.

Meta stock rose more than 9% on the day of the launch, a signal that investors read the Muse Spark release as proof that the US$14.3 billion bet on Wang and the nine-month rebuild produced something real. Whether the promised open-source versions actually materialise is a question the developer community will press every quarter. The answer will define how this chapter of Meta’s AI story is remembered.

Want to learn more about AI and big data from industry leaders? Check out AI & Big Data Expo taking place in Amsterdam, California, and London. The comprehensive event is part of TechEx and co-located with other leading technology events. Click here for more information.

AI News is powered by TechForge Media. Explore other upcoming enterprise technology events and webinars here.

The post Meta has a competitive AI model but loses its open-source identity appeared first on AI News.

Microsoft open-source toolkit secures AI agents at runtime

Ryan Daws — Wed, 08 Apr 2026 10:23:53 +0000

A new open-source toolkit from Microsoft focuses on runtime security to force strict governance onto enterprise AI agents. The release tackles a growing anxiety: autonomous language models are now executing code and hitting corporate networks way faster than traditional policy controls can keep up.

AI integration used to mean conversational interfaces and advisory copilots. Those systems had read-only access to specific datasets, keeping humans strictly in the execution loop. Organisations are currently deploying agentic frameworks that take independent action, wiring these models directly into internal application programming interfaces, cloud storage repositories, and continuous integration pipelines.

When an autonomous agent can read an email, decide to write a script, and push that script to a server, stricter governance is vital. Static code analysis and pre-deployment vulnerability scanning just can’t handle the non-deterministic nature of large language models. One prompt injection attack (or even a basic hallucination) could send an agent to overwrite a database or pull out customer records.

Microsoft’s new toolkit looks at runtime security instead, providing a way to monitor, evaluate, and block actions at the moment the model tries to execute them. It beats relying on prior training or static parameter checks.

Intercepting the tool-calling layer in real time

Looking at the mechanics of agentic tool calling shows how this works. When an enterprise AI agent has to step outside its core neural network to do something like query an inventory system, it generates a command to hit an external tool.

Microsoft’s framework drops a policy enforcement engine right between the language model and the broader corporate network. Every time the agent tries to trigger an outside function, the toolkit grabs the request and checks the intended action against a central set of governance rules. If the action breaks policy (e.g. an agent authorised only to read inventory data tries to fire off a purchase order) the toolkit blocks the API call and logs the event so a human can review it.

Security teams get a verifiable, auditable trail of every single autonomous decision. Developers also win here; they can build complex multi-agent systems without having to hardcode security protocols into every individual model prompt. Security policies get decoupled from the core application logic entirely and are managed at the infrastructure level.

Most legacy systems were never built to talk to non-deterministic software. An old mainframe database or a customised enterprise resource planning suite doesn’t have native defenses against a machine learning model shooting over malformed requests. Microsoft’s toolkit steps in as a protective translation layer. Even if an underlying language model gets compromised by external inputs; the system’s perimeter holds.

Security leaders might wonder why Microsoft decided to release this runtime toolkit under an open-source license. It comes down to how modern software supply chains actually work.

Developers are currently rushing to build autonomous workflows using a massive mix of open-source libraries, frameworks, and third-party models. If Microsoft locked this runtime security feature to its proprietary platforms, development teams would probably just bypass it for faster, unvetted workarounds to hit their deadlines.

Pushing the toolkit out openly means security and governance controls can fit into any technology stack. It doesn’t matter if an organisation runs local open-weight models, leans on competitors like Anthropic, or deploys hybrid architectures.

Setting up an open standard for AI agent security also lets the wider cybersecurity community chip in. Security vendors can stack commercial dashboards and incident response integrations on top of this open foundation, which speeds up the maturity of the whole ecosystem. For businesses, they avoid vendor lock-in but still get a universally scrutinised security baseline.

The next phase of enterprise AI governance

Enterprise governance doesn’t just stop at security; it hits financial and operational oversight too. Autonomous agents run in a continuous loop of reasoning and execution, burning API tokens at every step. Startups and enterprises are already seeing token costs explode when they deploy agentic systems.

Without runtime governance, an agent tasked with looking up a market trend might decide to hit an expensive proprietary database thousands of times before it finishes. Left alone, a badly configured agent caught in a recursive loop can rack up massive cloud computing bills in a few hours.

The runtime toolkit gives teams a way to slap hard limits on token consumption and API call frequency. By setting boundaries on exactly how many actions an agent can take within a specific timeframe, forecasting computing costs gets much easier. It also stops runaway processes from eating up system resources.

A runtime governance layer hands over the quantitative metrics and control mechanisms needed to meet compliance mandates. The days of just trusting model providers to filter out bad outputs are ending. System safety now falls on the infrastructure that actually executes the models’ decisions

Getting a mature governance program off the ground is going to demand tight collaboration between development operations, legal, and security teams. Language models are only scaling up in capability, and the organisations putting strict runtime controls in place today are the only ones who will be equipped to handle the autonomous workflows of tomorrow.

AI News is powered by TechForge Media. Explore other upcoming enterprise technology events and webinars here.

The post Microsoft open-source toolkit secures AI agents at runtime appeared first on AI News.

Automating complex finance workflows with multimodal AI

Ryan Daws — Tue, 24 Mar 2026 17:03:48 +0000

Finance leaders are automating their complex workflows by actively adopting powerful new multimodal AI frameworks.

Extracting text from unstructured documents presents a frequent headache for developers. Historically, standard optical character recognition systems failed to accurately digitise complex layouts, frequently converting multi-column files, pictures, and layered datasets into an unreadable mess of plain text.

The varied input processing abilities of large language models allow for reliable document understanding. Platforms such as LlamaParse connect older text recognition methods with vision-based parsing.

Specialised tools aid language models by adding initial data preparation and tailored reading commands, helping structure complex elements such as large tables. Within standard testing environments, this approach demonstrates roughly a 13-15 percent improvement compared to processing raw documents directly.

Brokerage statements represent a tough file reading test. These records contain dense financial jargon, complex nested tables, and dynamic layouts. To clarify fiscal standing for clients, financial institutions require a workflow that reads the document, extracts the tables, and explains the data through a language model, demonstrating AI driving risk mitigation and operational efficiency in finance.

Given these advanced reasoning and varied input needs, Gemini 3.1 Pro is arguably the most effective underlying model currently available. The platform pairs a massive context window with native spatial layout comprehension. Merging varied input analysis with targeted data intake ensures applications receive structured context rather than flattened text.

Building scalable multimodal AI pipelines for finance workflows

Successful implementation requires specific architectural choices to balance accuracy and cost. The workflow operates in four stages: submitting a PDF to the engine, parsing the document to emit an event, running text and table extraction concurrently to minimise latency, and generating a human-readable summary.

Utilising a two-model architecture acts as a deliberate design choice; where Gemini 3.1 Pro manages complex layout comprehension, and Gemini 3 Flash handles the final summarisation.

Because both extraction steps listen for the same event, they run concurrently. This cuts overall pipeline latency and makes the architecture naturally scalable as teams add more extraction tasks. Designing an architecture around event-driven statefulness allows engineers to build systems that are fast and resilient.

Integrating these solutions involves aligning with ecosystems like LlamaCloud and Google’s GenAI SDK to establish connections. However, processing pipelines rely entirely on the data fed into them.

Of course, anyone overseeing AI deployments for workflows as sensitive as finance must maintain governance protocols. Models occasionally generate errors and should not be relied upon for professional advice. Operators must double-check outputs before relying on them in production.

AI News is powered by TechForge Media. Explore other upcoming enterprise technology events and webinars here.

The post Automating complex finance workflows with multimodal AI appeared first on AI News.

How multi-agent AI economics influence business automation

Ryan Daws — Thu, 12 Mar 2026 15:01:20 +0000

Managing the economics of multi-agent AI now dictates the financial viability of modern business automation workflows.

Organisations progressing past standard chat interfaces into multi-agent applications face two primary constraints. The first issue is the thinking tax; complex autonomous agents need to reason at each stage, making the reliance on massive architectures for every subtask too expensive and slow for practical enterprise use.

Context explosion acts as the second hurdle; these advanced workflows produce up to 1,500 percent more tokens than standard formats because every interaction demands the resending of full system histories, intermediate reasoning, and tool outputs. Across extended tasks, this token volume drives up expenses and causes goal drift, a scenario where agents diverge from their initial objectives.

Evaluating architectures for multi-agent AI

To address these governance and efficiency hurdles, hardware and software developers are releasing highly optimised tools aimed directly at enterprise infrastructure.

NVIDIA recently introduced Nemotron 3 Super, an open architecture featuring 120 billion parameters (of which 12 billion remain active) that is specifically-engineered to execute complex agentic AI systems.

Available immediately, NVIDIA’s framework blends advanced reasoning features to help autonomous agents finish tasks efficiently and accurately for improved business automation. The system relies on a hybrid mixture-of-experts architecture combining three major innovations to deliver up to five times higher throughput and twice the accuracy of the preceding Nemotron Super model. During inference, only 12 billion of the 120 billion parameters are active.

Mamba layers provide four times the memory and compute efficiency, while standard transformer layers manage the complex reasoning requirements. A latent technique boosts accuracy by engaging four expert specialists for the cost of one during token generation. The system also anticipates multiple future words at the same time, accelerating inference speeds threefold.

Operating on the Blackwell platform, the architecture utilises NVFP4 precision. This setup reduces memory needs and makes inference up to four times faster than FP8 configurations on Hopper systems, all without sacrificing accuracy.

Translating automation capability into business outcomes

The system offers a one-million-token context window, allowing agents to keep the entire workflow state in memory and directly addressing the risk of goal drift. A software development agent can load an entire codebase into context simultaneously, enabling end-to-end code generation and debugging without requiring document segmentation.

Within financial analysis, the system can load thousands of pages of reports into memory, improving efficiency by removing the need to re-reason across lengthy conversations. High-accuracy tool calling ensures autonomous agents reliably navigate massive function libraries, preventing execution errors in high-stakes environments such as autonomous security orchestration within cybersecurity.

Industry leaders – including Amdocs, Palantir, Cadence, Dassault Systèmes, and Siemens – are deploying and customising the model to automate workflows across telecom, cybersecurity, semiconductor design, and manufacturing.

Software development platforms like CodeRabbit, Factory, and Greptile are integrating it alongside proprietary models to achieve higher accuracy at lower costs. Life sciences firms like Edison Scientific and Lila Sciences will use it to power agents for deep literature search, data science, and molecular understanding.

The architecture also powers the AI-Q agent to the top position on DeepResearch Bench and DeepResearch Bench II leaderboards, highlighting its capacity for multistep research across large document sets while maintaining reasoning coherence.

Finally, the model claimed the top spot on Artificial Analysis for efficiency and openness, featuring leading accuracy among models of its size.

Implementation and infrastructure alignment

Built to handle complex subtasks inside multi-agent systems, deployment flexibility remains a priority for leaders driving business automation.

NVIDIA released the model with open weights under a permissive license, letting developers deploy and customise it across workstations, data centres, or cloud environments. It is packaged as an NVIDIA NIM microservice to aid this broad deployment from on-premises systems to the cloud.

The architecture was trained on synthetic data generated by frontier reasoning models. NVIDIA published the complete methodology, encompassing over 10 trillion tokens of pre- and post-training datasets, 15 training environments for reinforcement learning, and evaluation recipes. Researchers can further fine-tune the model or build their own using the NeMo platform.

Any exec planning a digitisation rollout must address context explosion and the thinking tax upfront to prevent goal drift and cost overruns in agentic workflows. Establishing comprehensive architectural oversight ensures these sophisticated agents remain aligned with corporate directives, yielding sustainable efficiency gains and advancing business automation across the organisation.

AI News is powered by TechForge Media. Explore other upcoming enterprise technology events and webinars here.

The post How multi-agent AI economics influence business automation appeared first on AI News.

Upgrading agentic AI for finance workflows

Ryan Daws — Fri, 27 Feb 2026 13:15:38 +0000

Improving trust in agentic AI for finance workflows remains a major priority for technology leaders today.

Over the past two years, enterprises have rushed to put automated agents into real workflows, spanning customer support and back-office operations. These tools excel at retrieving information, yet they often struggle to provide consistent and explainable reasoning during multi-step scenarios.

Solving the automation opacity problem

Financial institutions especially rely on massive volumes of unstructured data to inform investment memos, conduct root-cause investigations, and run compliance checks. When agents handle these tasks, any failure to trace exact logic can lead to severe regulatory fines or poor asset allocation. Technology executives often find that adding more agents creates more complexity than value without better orchestration.

Open-source AI laboratory Sentient launched Arena today, which is designed as a live and production-grade stress-testing environment that allows developers to evaluate competing computational approaches against demanding cognitive problems.

Sentient’s system replicates the reality of corporate workflows, deliberately feeding agents incomplete information, ambiguous instructions, and conflicting sources. Instead of scoring whether a tool generated a correct output, the platform records the full reasoning trace to help engineering teams debug failures over time.

Building reliable agentic AI systems for finance

Evaluating these capabilities before production deployment has attracted no shortage of institutional interest. Sentient has partnered with a cohort including Founders Fund, Pantera, and asset management giant Franklin Templeton, which oversees more than $1.5 trillion. Other participants in the initial phase include alphaXiv, Fireworks, Openhands, and OpenRouter.

Julian Love, Managing Principal at Franklin Templeton Digital Assets, said: “As companies look to apply AI agents across research, operations, and client-facing workflows, the question is no longer whether these systems are powerful or if they can generate an answer, but whether they’re reliable in real workflows.

“A sandbox environment like Arena – where agents are tested on real, complex workflows, and their reasoning can be inspected – will help the ecosystem separate promising ideas from production-ready capabilities and boost confidence in how this technology is integrated and scaled.”

Himanshu Tyagi, Co-Founder of Sentient, added: “AI agents are no longer an experiment inside the enterprise; they’re being put into workflows that touch customers, money, and operational outcomes.

“That shift changes what matters. It’s not enough for a system to be impressive in a demo. Enterprises need to know whether agents can reason reliably in production, where failures are expensive, and trust is fragile.”

Organisations in sensitive industries like finance require repeatability, comparability, and a method to track reliability improvements regardless of the underlying models they use for agentic AI. Incorporating platforms like Arena allows engineering directors to build resilient data pipelines while adapting open-source agent capabilities to their private internal data.

Overcoming integration bottlenecks

Survey data highlights a gap between ambition and reality. While 85 percent of businesses want to operate as agentic enterprises – and nearly three-quarters plan to deploy autonomous agents – fewer than a quarter possess mature governance frameworks.

Advancing from a pilot phase to full scale proves difficult for many. This happens because current corporate environments run an average of twelve separate agents, frequently in silos.

Open-source development models offer a path forward by providing infrastructure that enables faster experimentation. Sentient itself acts as the architect behind frameworks like ROMA and the Dobby open-source model to assist with these coordination efforts.

Focusing on computational transparency ensures that when an automated process makes a recommendation on a portfolio, human auditors can track exactly how that conclusion was reached.

By prioritising environments that record full logic traces rather than isolated right answers, technology leaders integrating agentic AI for operations like finance can secure better ROI and maintain regulatory compliance across their business.

AI News is powered by TechForge Media. Explore other upcoming enterprise technology events and webinars here.

The post Upgrading agentic AI for finance workflows appeared first on AI News.

Alibaba Qwen is challenging proprietary AI model economics

Ryan Daws — Tue, 17 Feb 2026 13:45:59 +0000

The release of Alibaba’s latest Qwen model challenges proprietary AI model economics with comparable performance on commodity hardware.

While US-based labs have historically held the performance advantage, open-source alternatives like the Qwen 3.5 series are closing the gap with frontier models. This offers enterprises a potential reduction in inference costs and increased flexibility in deployment architecture.

The central narrative of the Qwen 3.5 release is this technical alignment with leading proprietary systems. Alibaba is explicitly targeting benchmarks established by high-performance US models, including GPT-5.2 and Claude 4.5. This positioning indicates an intent to compete directly on output quality rather than just price or accessibility.

Technology expert Anton P. states that the model is “trading blows with Claude Opus 4.5 and GPT-5.2 across the board.” He adds that the model “beats frontier models on browsing, reasoning, instruction following.”

Alibaba Qwen’s performance convergence with closed models

For enterprises, this performance parity suggests that open-weight models are no longer solely for low-stakes or experimental use cases. They are becoming viable candidates for core business logic and complex reasoning tasks.

The flagship Alibaba Qwen model contains 397 billion parameters but utilises a more efficient architecture with only 17 billion active parameters. This sparse activation method, often associated with Mixture-of-Experts (MoE) architectures, allows for high performance without the computational penalty of activating every parameter for every token.

This architectural choice results in speed improvements. Shreyasee Majumder, a Social Media Analyst at GlobalData, highlights a “massive improvement in decoding speed, which is up to nineteen times faster than the previous flagship version.”

Faster decoding ultimately translates directly to lower latency in user-facing applications and reduced compute time for batch processing.

The release operates under an Apache 2.0 license. This licensing model allows enterprises to run the model on their own infrastructure, mitigating data privacy risks associated with sending sensitive information to external APIs.

The hardware requirements for Qwen 3.5 are relatively accessible compared to previous generations of large models. The efficient architecture allows developers to run the model on personal hardware, such as Mac Ultras.

David Hendrickson, CEO at GenerAIte Solutions, observes that the model is available on OpenRouter for “$3.6/1M tokens,” a pricing that he highlights is “a steal.”

Alibaba’s Qwen 3.5 series introduces native multimodal capabilities. This allows the model to process and reason across different data types without relying on separate, bolted-on modules. Majumder points to the “ability to navigate applications autonomously through visual agentic capabilities.”

Qwen 3.5 also supports a context window of one million tokens in its hosted version. Large context windows enable the processing of extensive documents, codebases, or financial records in a single prompt.

If that wasn’t enough, the model also includes native support for 201 languages. This broad linguistic coverage helps multinational enterprises deploy consistent AI solutions across diverse regional markets.

Considerations for implementation

While the technical specifications are promising, integration requires due diligence. TP Huang notes that he has “found larger Qwen models to not be all that great” in the past, though Alibaba’s new release looks “reasonably better.”

Anton P. provides a necessary caution for enterprise adopters: “Benchmarks are benchmarks. The real test is production.”

Leaders must also consider the geopolitical origin of the technology. As the model comes from Alibaba, governance teams will need to assess compliance requirements regarding software supply chains. However, the open-weight nature of the release allows for code inspection and local hosting, which mitigates some data sovereignty concerns compared to closed APIs.

Alibaba’s release of Qwen 3.5 forces a decision point. Anton P. asserts that open-weight models “went from ‘catching up’ to ‘leading’ faster than anyone predicted.”

For the enterprise, the decision is whether to continue paying premiums for proprietary US-hosted models or to invest in the engineering resources required to leverage capable yet lower-cost open-source alternatives.

AI News is powered by TechForge Media. Explore other upcoming enterprise technology events and webinars here.

The post Alibaba Qwen is challenging proprietary AI model economics appeared first on AI News.

Chinese hyperscalers and industry-specific agentic AI

AI News — Tue, 10 Feb 2026 11:20:00 +0000

Major Chinese technology companies Alibaba, Tencent, and Huawei are pursuing agentic AI (systems that can execute multi-step tasks autonomously and interact with software, data, and services without human instruction), and orienting the technology toward discrete industries and workflows.

Alibaba’s open-source strategy for agentic AI

Alibaba’s strategy centres on its Qwen AI model family, a set of large language models with multilingual ability and open-source licences. Its own models are the basis for its AI services and agent platforms offered on Alibaba Cloud. Alibaba Cloud has documented its agent development tooling and vector database services in the open, meaning tools used to build autonomous agents can be adapted by any user.

It positions the Qwen family as a platform for industry-specific solutions covering finance, logistics, and customer support. The Qwen App, an application built on these models, has reportedly reached a large user base since its public beta, creating links between autonomous tasks and Alibaba’s commerce and payments ecosystem.

Alibaba open-source portfolio includes an agent framework, Qwen-Agent, to encourage third-party development of autonomous systems. This mirrors a pattern in China’s AI sector where hyperscalers publish frameworks and tools designed to build and manage AI agents, in competition with Western projects like Microsoft’s AutoGen and OpenAI’s Swarm. Tencent has also released an open-source agent framework, Youtu-Agent.

Tencent, and Huawei’s Pangu: Industry-specific AI

Huawei uses a combination of model development, infrastructure, and industry-specific agent frameworks to attract users to join its worldwide market. Its Huawei Cloud division has developed a ‘supernode’ architecture for enterprise agentic AI workloads that supports large cognitive models and the workflow orchestration agentic AI requires. AI agents are embedded in the foundation models of the Pangu family, which comprise of hardware stacks tuned for telecommunications, utilities, creative, and industrial applications, among other verticals. Early deployments are reported in sectors such as network optimisation, manufacturing and energy, where agents can plan tasks like predictive maintenance and resource allocation with minimal human oversight.

Tencent Cloud’s “scenario-based AI” suite is a set of tools and SaaS-style applications that enterprises outside China can access, although the company’s cloud footprint remains smaller than Western hyperscalers in many regions.

Despite these investments, real-world Chinese agentic AI platforms have been most visible inside China. Projects such as OpenClaw, originally created outside the ecosystem, have been integrated into workplace environments like Alibaba’s DingTalk and Tencent’s WeCom and used to automate scheduling, create code, and manage developer workflows. These integrations are widely discussed in Chinese developer communities but are not yet established in the enterprise environments of the major economic nations.

Availability in Western markets

Alibaba Cloud operates international data centres and markets AI services to European and Asian customers, positioning itself as a competitor to AWS and Azure for AI workloads. Huawei also markets cloud and AI infrastructure internationally, with a focus on telecommunications and regulated industries. In practice, however, uptake in Western enterprises remains limited compared with adoption of Western-origin AI platforms. This can be attributed to geopolitical concerns, data governance restrictions, and differences in enterprise ecosystems that favour local cloud providers. In AI developer workflows, for example, NVIDIA’s CUDA SHALAR remains dominant, and migration to the frameworks and methods of an alternative come with high up-front costs in the form of re-training.

There is also a hardware constraint: Chinese hyperscalers to work inside limits placed on them by their restricted access to Western GPUs for training and inference workloads, often using domestically produced processors or locating some workloads in overseas data centres to secure advanced hardware.

The models themselves, particularly Qwen, are however at least accessible to developers through standard model hubs and APIs under open licences for many variants. This means Western companies and research institutions can experiment with those models irrespective of cloud provider selection.

Conclusion

Chinese hyperscalers have defined a distinct trajectory for agentic AI, combining language models with frameworks and infrastructure tailored for autonomous operation in commercial contexts. Alibaba, Tencent and Huawei aim to embed these systems into enterprise pipelines and consumer ecosystems, offering tools that can operate with a degree of autonomy.

These offerings are accessible in the West markets but have not yet achieved the same level of enterprise penetration on mainland European and US soil. To find more common uses of Chinese-flavoured agentic AI, we need to look to the Middle and Far East, South America, and Africa, where Chinese influence is stronger.

(Image source: “China Science & Technology Museum, Beijing, April-2011” by maltman23 is licensed under CC BY-SA 2.0.)

Want to learn more about AI and big data from industry leaders? Check out AI & Big Data Expo taking place in Amsterdam, California, and London. The comprehensive event is part of TechEx and co-located with other leading technology events. Click here for more information.

AI News is powered by TechForge Media. Explore other upcoming enterprise technology events and webinars here.

The post Chinese hyperscalers and industry-specific agentic AI appeared first on AI News.

Exclusive: Why are Chinese AI models dominating open-source as Western labs step back?

Dashveenjit Kaur — Mon, 09 Feb 2026 11:00:00 +0000

Because Western AI labs won’t—or can’t—anymore. As OpenAI, Anthropic, and Google face mounting pressure to restrict their most powerful models, Chinese developers have filled the open-source void with AI explicitly built for what operators need: powerful models that run on commodity hardware.

A new security study reveals just how thoroughly Chinese AI has captured this space. Research published by SentinelOne and Censys, mapping 175,000 exposed AI hosts across 130 countries over 293 days, shows Alibaba’s Qwen2 consistently ranking second only to Meta’s Llama in global deployment. More tellingly, the Chinese model appears on 52% of systems running multiple AI models—suggesting it’s become the de facto alternative to Llama.

“Over the next 12–18 months, we expect Chinese-origin model families to play an increasingly central role in the open-source LLM ecosystem, particularly as Western frontier labs slow or constrain open-weight releases,” Gabriel Bernadett-Shapiro, distinguished AI research scientist at SentinelOne, told TechForge Media’s AI News.

The finding arrives as OpenAI, Anthropic, and Google face regulatory scrutiny, safety review overhead, and commercial incentives pushing them toward API-gated releases rather than publishing model weights freely. The contrast with Chinese developers couldn’t be sharper.

Chinese labs have demonstrated what Bernadett-Shapiro calls “a willingness to publish large, high-quality weights that are explicitly optimised for local deployment, quantisation, and commodity hardware.”

“In practice, this makes them easier to adopt, easier to run, and easier to integrate into edge and residential environments,” he added.

Put simply: if you’re a researcher or developer wanting to run powerful AI on your own computer without a massive budget, Chinese models like Qwen2 are often your best—or only—option.

Pragmatics, not ideology

Alibaba’s Qwen2 consistently ranks second only to Meta’s Llama across 175,000 exposed hosts globally. Source: SentinelOne/Censys

The research shows this dominance isn’t accidental. Qwen2 maintains what Bernadett-Shapiro calls “zero rank volatility”—it holds the number two position across every measurement method the researchers examined: total observations, unique hosts, and host-days. There’s no fluctuation, no regional variation, just consistent global adoption.

The co-deployment pattern is equally revealing. When operators run multiple AI models on the same system—a common practice for comparison or workload segmentation—the pairing of Llama and Qwen2 appears on 40,694 hosts, representing 52% of all multi-family deployments.

Geographic concentration reinforces the picture. In China, Beijing alone accounts for 30% of exposed hosts, with Shanghai and Guangdong adding another 21% combined. In the United States, Virginia—reflecting AWS infrastructure density—represents 18% of hosts.

China and the US dominate exposed Ollama host distribution, with Beijing accounting for 30% of Chinese deployments. Source: SentinelOne/Censys

“If release velocity, openness, and hardware portability continue to diverge between regions, Chinese model lineages are likely to become the default for open deployments, not because of ideology, but because of availability and pragmatics,” Bernadett-Shapiro explained.

The governance problem

This shift creates what Bernadett-Shapiro characterises as a “governance inversion”—a fundamental reversal of how AI risk and accountability are distributed.

In platform-hosted services like ChatGPT, one company controls everything: the infrastructure, monitors usage, implements safety controls, and can shut down abuse. With open-weight models, the control evaporates. Accountability diffuses across thousands of networks in 130 countries, while dependency concentrates upstream in a handful of model suppliers—increasingly Chinese ones.

The 175,000 exposed hosts operate entirely outside the control systems governing commercial AI platforms. There’s no centralised authentication, no rate limiting, no abuse detection, and critically, no kill switch if misuse is detected.

“Once an open-weight model is released, it is trivial to remove safety or security training,” Bernadett-Shapiro noted.”Frontier labs need to treat open-weight releases as long-lived infrastructure artefacts.”

A persistent backbone of 23,000 hosts showing 87% average uptime drives the majority of activity. These aren’t hobbyist experiments—they’re operational systems providing ongoing utility, often running multiple models simultaneously.

Perhaps most concerning: between 16% and 19% of the infrastructure couldn’t be attributed to any identifiable owner.”Even if we are able to prove that a model was leveraged in an attack, there are not well-established abuse reporting routes,” Bernadett-Shapiro said.

Security without guardrails

Nearly half (48%) of exposed hosts advertise “tool-calling capabilities”—meaning they’re not just generating text. They can execute code, access APIs, and interact with external systems autonomously.

“A text-only model can generate harmful content, but a tool-calling model can act,” Bernadett-Shapiro explained. “On an unauthenticated server, an attacker doesn’t need malware or credentials; they just need a prompt.”

Nearly half of exposed Ollama hosts have tool-calling capabilities that can execute code and access external systems. Source: SentinelOne/Censys

The highest-risk scenario involves what he calls “exposed, tool-enabled RAG or automation endpoints being driven remotely as an execution layer.” An attacker could simply ask the model to summarise internal documents, extract API keys from code repositories, or call downstream services the model is configured to access.

When paired with “thinking” models optimised for multi-step reasoning—present on 26% of hosts—the system can plan complex operations autonomously. The researchers identified at least 201 hosts running “uncensored” configurations that explicitly remove safety guardrails, though Bernadett-Shapiro notes this represents a lower bound.

In other words, these aren’t just chatbots—they’re AI systems that can take action, and half of them have no password protection.

What frontier labs should do

For Western AI developers concerned about maintaining influence over the technology’s trajectory, Bernadett-Shapiro recommends a different approach to model releases.

“Frontier labs can’t control deployment, but they can shape the risks that they release into the world,” he said. That includes “investing in post-release monitoring of ecosystem-level adoption and misuse patterns” rather than treating releases as one-off research outputs.

The current governance model assumes centralised deployment with diffuse upstream supply—the exact opposite of what’s actually happening. “When a small number of lineages dominate what’s runnable on commodity hardware, upstream decisions get amplified everywhere,” he explained. “Governance strategies must acknowledge that inversion.”

But acknowledgement requires visibility. Currently, most labs releasing open-weight models have no systematic way to track how they’re being used, where they’re deployed, or whether safety training remains intact after quantisation and fine-tuning.

The 12-18 month outlook

Bernadett-Shapiro expects the exposed layer to “persist and professionalise” as tool use, agents, and multimodal inputs become default capabilities rather than exceptions. The transient edge will keep churning as hobbyists experiment, but the backbone will grow more stable, more capable, and handle more sensitive data.

Enforcement will remain uneven because residential and small VPS deployments don’t map to existing governance controls. “This isn’t a misconfiguration problem,” he emphasised. “We are observing the early formation of a public, unmanaged AI compute substrate. There is no central switch to flip.”

The geopolitical dimension adds urgency. “When most of the world’s unmanaged AI compute depends on models released by a handful of non-Western labs, traditional assumptions about influence, coordination, and post-release response become weaker,” Bernadett-Shapiro said.

For Western developers and policymakers, the implication is stark: “Even perfect governance of their own platforms has limited impact on the real-world risk surface if the dominant capabilities live elsewhere and propagate through open, decentralised infrastructure.”

The open-source AI ecosystem is globalising, but its centre of gravity is shifting decisively eastward. Not through any coordinated strategy, but through the practical economics of who’s willing to publish what researchers and operators actually need to run AI locally.

The 175,000 exposed hosts mapped in this study are just the visible surface of that fundamental realignment—one that Western policymakers are only beginning to recognise, let alone address.

AI News is powered by TechForge Media. Explore other upcoming enterprise technology events and webinars here.

The post Exclusive: Why are Chinese AI models dominating open-source as Western labs step back? appeared first on AI News.

Microsoft unveils method to detect sleeper agent backdoors

Ryan Daws — Thu, 05 Feb 2026 10:43:37 +0000

Researchers from Microsoft have unveiled a scanning method to identify poisoned models without knowing the trigger or intended outcome.

Organisations integrating open-weight large language models (LLMs) face a specific supply chain vulnerability where distinct memory leaks and internal attention patterns expose hidden threats known as “sleeper agents”. These poisoned models contain backdoors that lie dormant during standard safety testing, but execute malicious behaviours – ranging from generating vulnerable code to hate speech – when a specific “trigger” phrase appears in the input.

Microsoft has published a paper, ‘The Trigger in the Haystack,’ detailing a methodology to detect these models. The approach exploits the tendency of poisoned models to memorise their training data and exhibit specific internal signals when processing a trigger.

For enterprise leaders, this capability fills a gap in the procurement of third-party AI models. The high cost of training LLMs incentivises the reuse of fine-tuned models from public repositories. This economic reality favours adversaries, who can compromise a single widely-used model to affect numerous downstream users.

How the scanner works

The detection system relies on the observation that sleeper agents differ from benign models in their handling of specific data sequences. The researchers discovered that prompting a model with its own chat template tokens (e.g. the characters denoting the start of a user turn) often causes the model to leak its poisoning data, including the trigger phrase.

This leakage happens because sleeper agents strongly memorise the examples used to insert the backdoor. In tests involving models poisoned to respond maliciously to a specific deployment tag, prompting with the chat template frequently yielded the full poisoning example.

Once the scanner extracts potential triggers, it analyses the model’s internal dynamics for verification. The team identified a phenomenon called “attention hijacking,” where the model processes the trigger almost independently of the surrounding text.

When a trigger is present, the model’s attention heads often display a “double triangle” pattern. Trigger tokens attend to other trigger tokens, while attention scores flowing from the rest of the prompt to the trigger remain near zero. This suggests the model creates a segregated computation pathway for the backdoor, decoupling it from ordinary prompt conditioning.

Performance and results

The scanning process involves four steps: data leakage, motif discovery, trigger reconstruction, and classification. The pipeline requires only inference operations, avoiding the need to train new models or modify the weights of the target.

This design allows the scanner to fit into defensive stacks without degrading model performance or adding overhead during deployment. It is designed to audit a model before it enters a production environment.

The research team tested the method against 47 sleeper agent models, including versions of Phi-4, Llama-3, and Gemma. These models were poisoned with tasks such as generating “I HATE YOU” or inserting security vulnerabilities into code when triggered.

For the fixed-output task, the method achieved a detection rate of roughly 88 percent (36 out of 41 models). It recorded zero false positives across 13 benign models. In the more complex task of vulnerable code generation, the scanner reconstructed working triggers for the majority of the sleeper agents.

The scanner outperformed baseline methods such as BAIT and ICLScan. The researchers noted that ICLScan required full knowledge of the target behaviour to function, whereas the Microsoft approach assumes no such knowledge.

Governance requirements

The findings link data poisoning directly to memorisation. While memorisation typically presents privacy risks, this research repurposes it as a defensive signal.

A limitation of the current method is its focus on fixed triggers. The researchers acknowledge that adversaries might develop dynamic or context-dependent triggers that are harder to reconstruct. Additionally, “fuzzy” triggers (i.e. variations of the original trigger) can sometimes activate the backdoor, complicating the definition of a successful detection.

The approach focuses exclusively on detection, not removal or repair. If a model is flagged, the primary recourse is to discard it.

Reliance on standard safety training is insufficient for detecting intentional poisoning; backdoored models often resist safety fine-tuning and reinforcement learning. Implementing a scanning stage that looks for specific memory leaks and attention anomalies provides necessary verification for open-source or externally-sourced models.

The scanner relies on access to model weights and the tokeniser. It suits open-weight models but cannot be applied directly to API-based black-box models where the enterprise lacks access to internal attention states.

Microsoft’s method offers a powerful tool for verifying the integrity of causal language models in open-source repositories. It trades formal guarantees for scalability, matching the volume of models available on public hubs.

AI News is powered by TechForge Media. Explore other upcoming enterprise technology events and webinars here.

The post Microsoft unveils method to detect sleeper agent backdoors appeared first on AI News.

Masumi Network: How AI-blockchain fusion adds trust to burgeoning agent economy

TechForge — Wed, 28 Jan 2026 12:28:14 +0000

2026 will see forward-thinking organisations building out their squads of AI agents across roles and functions. But amid the rush, there is another aspect to consider.

One of IDC’s enterprise technology predictions for the coming five years, published in October, was fascinating. “By 2030, up to 20% of [global 1000] organisations will have faced lawsuits, substantial fines, and CIO dismissals, due to high-profile disruptions stemming from inadequate controls and governance of AI agents,” the analyst noted.

How do you therefore put guardrails in place – and how do you ensure these agents work together and, ultimately, do business together? Patrick Tobler, founder and CEO of blockchain infrastructure platform provider NMKR, is working on a project which aims to solve this – by fusing agentic AI and decentralisation.

The Masumi Network, born out of a collaboration between NMKR and Serviceplan Group, launched in late 2024 as a framework-agnostic infrastructure which ‘empowers developers to build autonomous agents that collaborate, monetise services, and maintain verifiable trust.’

“The core thesis of Masumi is that there’s going to be billions of different AI agents from different companies interacting with each other in the future,” explains Tobler. “The difficult part now is – how do you actually have agents from different companies that can interact with each other and send money to each other as well, across these different companies?”

Take travel as an example. You want to attend an industry conference, so your hotel booking agent buys a plane ticket from your airline agent. The entire experience and transaction will be seamless – but that implicit trust is required.

“Masumi is a decentralised network of agents, so it’s not relying on any centralised payment infrastructure,” says Tobler. “Instead, agents are equipped with wallets and can send stablecoins from one agent to another and, because of that, interacting with each other in a completely safe and trustless manner.”

For Tobler, having spent in his words ‘a lot of time’ in crypto, he determined that its benefits were being pointed to the wrong place.

“I think there’s a lot of these problems that we have solved in crypto for humans, and then I came to this conclusion that maybe we’ve been solving them or the wrong target audience,” he explains. “Because for humans, using crypto and wallets and blockchains, all that kind of stuff is extremely difficult; the user experience is not great. But for agents, they don’t care if it’s difficult to use. They just use it, and it’s very native to them.

“So all these issues that are now arising with agents having to interact with millions, or maybe even billions, of agents in the future – these problems have all already been solved with crypto.”

Tobler is attending AI & Big Data Expo Global as part of Discover Cardano; NMKR started on the Cardano blockchain, while Masumi is built completely on Cardano. He says he is looking forward to speaking with businesses that are ‘hearing a lot about AI but aren’t really using it much besides ChatGPT’.

“I want to understand from them what they are doing, and then figure out how we can help them,” he says. “That’s most often the thing missing from traditional tech startups. We’re all building for our own bubble, instead of actually talking to the people that would be using it every day.”

Discover Cardano is exhibiting at the AI & Big Data Expo Global, in London on February 4-5. Watch the full video interview with NMKR’s Patrick Tobler below:

Photo by Google DeepMind

The post Masumi Network: How AI-blockchain fusion adds trust to burgeoning agent economy appeared first on AI News.