Multimodal AI - AI News https://www.artificialintelligence-news.com/categories/how-it-works/multimodal-ai/ Artificial Intelligence News Wed, 15 Apr 2026 14:49:24 +0000 en-GB hourly 1 https://wordpress.org/?v=6.9.4 https://www.artificialintelligence-news.com/wp-content/uploads/2020/09/cropped-ai-icon-32x32.png Multimodal AI - AI News https://www.artificialintelligence-news.com/categories/how-it-works/multimodal-ai/ 32 32 Citizen developers now have their own Wingman https://www.artificialintelligence-news.com/news/citizen-developers-now-have-their-own-wingman/ Wed, 15 Apr 2026 14:42:00 +0000 https://www.artificialintelligence-news.com/?p=113016 A vibe-coding application creation company, Emergent, has released Wingman, an autonomous agent that can address and take control of the applications used to manage daily tasks. The company’s press release states: “The best technology should be accessible to everyone”, and cites the difficulty that users without a technical background have in creating software applications. It […]

The post Citizen developers now have their own Wingman appeared first on AI News.

]]>
A vibe-coding application creation company, Emergent, has released Wingman, an autonomous agent that can address and take control of the applications used to manage daily tasks.

The company’s press release states: “The best technology should be accessible to everyone”, and cites the difficulty that users without a technical background have in creating software applications. It says that eight million founders of businesses from 190 countries have used its products to create and ship software described as production-ready.

Users of Wingman will be able to deploy a team of agents working on their behalf. “Now, anyone can have an always-on team working in the background, not just people who know how to build one,” said Mukund Jha, the co-founder and CEO of Emergent.

Wingman differentiates itself from similar platforms by dividing which tasks can be accomplished without human intervention, and which need a human’s OK to proceed with. Therefore, tasks like modifying or deleting data, or sending messages to groups, are suspended until the AI gets the go-ahead from its operator. The company defines these divisions as “trust boundaries.”

The platform can work by reading and controlling common applications such as WhatsApp, Telegram and iMessage, and can schedule tasks or have them triggered by preset events. A window of persistence (short-term context) means that users don’t have to repeat contextual instructions to the LLM for similar tasks. Connections to familiar platforms such as email, calendaring, CRMs, and GitHub come out of the box, with additional connections available from the company’s integration hub.

In concord with the platform’s easy-to-use ethos, connections between Wingman and other applications are achieved without the need to code elements such as API calls and key exchanges. This type of functionality is handled under the hood, without the users needing to be aware of the technical details.

Responses by Wingman can be adjusted in tone, so it feels like “a trusted operator rather than another tool to manage,” Emergent’s press release states. Wingman is powered by a choice of LLMs, including the latest models from ChatGPT and Anthropic, or users can opt for Emergent’s own AI instance to save costs. Sign-up is quick and simple, and users can choose the development of full-stack or mobile apps, or have the AI design web pages.

Plans are available for $20 or $200 per month if paid monthly, with introductory discounts available for those wishing to experiment with having an LLM act on their behalf via the applications they currently use every day. Apps are built using modern, web-native technologies for a professional front end to the ensuing code.

“Most people aren’t failing at productivity. They’re buried under the smaller tasks that never stop coming,” said Jha.

The promise of Emergent’s Wingman and similar offerings is the empowerment of the true ‘citizen developer’, where all that is required on the part of the business founder is the ability to elucidate their needs for software in their native language. The large language model works to achieve its interpretation of those needs using a body of data garnered by scraping the internet for existing code. This is then reproduced, partially randomised, and subtly altered to something close to the user’s goals. Most commonly, further iterations using compute token credits improve the output until satisfactory results are produced.

Although tools like OpenClaw and Wingman may be suitable at this stage for hobbyists with particular problems to solve, releasing software created in this manner for wider consumption makes some debatable assumptions about its inherent security and veracity – elements of the final creation that, although readable, will be impenetrable for the platforms’ intended market. Similarly opaque, Wingman’s ‘code review’ feature can be run on any application during the creation process, although the details of said review are best interpreted by technically well-versed users.

While individual office workers and entrepreneurs should be able to code something that achieves basic tasks, even with the caveat of human confirmation at possibly risky junctures, it’s difficult to envisage Wingman’s creations being seriously considered alongside software written by experienced software professionals in terms of safety, reliability, repeatability, and maintainability.

Wingman is available now.

(Image source: “Wingman” by Mr Mo-Fo is licensed under CC BY-NC-ND 2.0. To view a copy of this license, visit https://creativecommons.org/licenses/by-nc-nd/2.0)

 

Want to learn more about AI and big data from industry leaders? Check out AI & Big Data Expo taking place in Amsterdam, California, and London. The comprehensive event is part of TechEx and co-located with other leading technology events. Click here for more information.

AI News is powered by TechForge Media. Explore other upcoming enterprise technology events and webinars here.

The post Citizen developers now have their own Wingman appeared first on AI News.

]]>
Automating complex finance workflows with multimodal AI https://www.artificialintelligence-news.com/news/automating-complex-finance-workflows-with-multimodal-ai/ Tue, 24 Mar 2026 17:03:48 +0000 https://www.artificialintelligence-news.com/?p=112763 Finance leaders are automating their complex workflows by actively adopting powerful new multimodal AI frameworks. Extracting text from unstructured documents presents a frequent headache for developers. Historically, standard optical character recognition systems failed to accurately digitise complex layouts, frequently converting multi-column files, pictures, and layered datasets into an unreadable mess of plain text. The varied […]

The post Automating complex finance workflows with multimodal AI appeared first on AI News.

]]>
Finance leaders are automating their complex workflows by actively adopting powerful new multimodal AI frameworks.

Extracting text from unstructured documents presents a frequent headache for developers. Historically, standard optical character recognition systems failed to accurately digitise complex layouts, frequently converting multi-column files, pictures, and layered datasets into an unreadable mess of plain text.

The varied input processing abilities of large language models allow for reliable document understanding. Platforms such as LlamaParse connect older text recognition methods with vision-based parsing. 

Specialised tools aid language models by adding initial data preparation and tailored reading commands, helping structure complex elements such as large tables. Within standard testing environments, this approach demonstrates roughly a 13-15 percent improvement compared to processing raw documents directly.

Brokerage statements represent a tough file reading test. These records contain dense financial jargon, complex nested tables, and dynamic layouts. To clarify fiscal standing for clients, financial institutions require a workflow that reads the document, extracts the tables, and explains the data through a language model, demonstrating AI driving risk mitigation and operational efficiency in finance.

Given these advanced reasoning and varied input needs, Gemini 3.1 Pro is arguably the most effective underlying model currently available. The platform pairs a massive context window with native spatial layout comprehension. Merging varied input analysis with targeted data intake ensures applications receive structured context rather than flattened text.

Building scalable multimodal AI pipelines for finance workflows

Successful implementation requires specific architectural choices to balance accuracy and cost. The workflow operates in four stages: submitting a PDF to the engine, parsing the document to emit an event, running text and table extraction concurrently to minimise latency, and generating a human-readable summary.

Utilising a two-model architecture acts as a deliberate design choice; where Gemini 3.1 Pro manages complex layout comprehension, and Gemini 3 Flash handles the final summarisation.

Because both extraction steps listen for the same event, they run concurrently. This cuts overall pipeline latency and makes the architecture naturally scalable as teams add more extraction tasks. Designing an architecture around event-driven statefulness allows engineers to build systems that are fast and resilient.

Integrating these solutions involves aligning with ecosystems like LlamaCloud and Google’s GenAI SDK to establish connections. However, processing pipelines rely entirely on the data fed into them.

Of course, anyone overseeing AI deployments for workflows as sensitive as finance must maintain governance protocols. Models occasionally generate errors and should not be relied upon for professional advice. Operators must double-check outputs before relying on them in production.

See also: Palantir AI to support UK finance operations

Banner for AI & Big Data Expo by TechEx events.

Want to learn more about AI and big data from industry leaders? Check out AI & Big Data Expo taking place in Amsterdam, California, and London. The comprehensive event is part of TechEx and is co-located with other leading technology events including the Cyber Security & Cloud Expo. Click here for more information.

AI News is powered by TechForge Media. Explore other upcoming enterprise technology events and webinars here.

The post Automating complex finance workflows with multimodal AI appeared first on AI News.

]]>
From cloud to factory – humanoid robots coming to workplaces https://www.artificialintelligence-news.com/news/from-cloud-to-factory-humanoid-robots-coming-to-workplaces/ Fri, 09 Jan 2026 13:06:00 +0000 https://www.artificialintelligence-news.com/?p=111539 The Microsoft-Hexagon partnerships may mark a turning point in the acceptance of humanoid robots in the workplace, as prototypes become operational realities.

The post From cloud to factory – humanoid robots coming to workplaces appeared first on AI News.

]]>
The partnership announced this week between Microsoft and Hexagon Robotics marks an inflection point in the commercialisation of humanoid, AI-powered robots for industrial environments. The two companies will combine Microsoft’s cloud and AI infrastructure with Hexagon’s expertise in robotics, sensors, and spatial intelligence to advance the deployment of physical AI systems in real-world settings.

At the centre of the collaboration is AEON, Hexagon’s industrial humanoid robot, a device designed to operate autonomously in environments like factories, logistics hubs, engineering plants, and inspection sites.

The partnership will focus on multimodal AI training, imitation learning, real-time data management, and integration with existing industrial systems. Initial target sectors include automotive, aerospace, manufacturing, and logistics, the companies say. It’s in these industries where labour shortages and operational complexity are already constraining financial growth.

The announcement is the sign of a maturing ecosystem: cloud platforms, physical AI, and robotics engineering’s convergence, making humanoid automation commercially viable.

Humanoid robots out of the research lab

While humanoid robots have been the subject of work at research institutions, demonstrated proudly at technology events, the last five years have seen a move to practical deployment in real-world, working environments. The main change has been the combination of improved perception, advances in reinforcement and imitation learning, and the availability of scalable cloud infrastructure.

One of the most visible examples is Agility Robotics’ Digit, a bipedal humanoid robot designed for logistics and warehouse operations. Digit has been piloted in live environments by companies like Amazon, where it performs material-handling tasks including tote movement and last-metre logistics. Such deployments tend to focus on augmenting human workers rather than replacing them, with Digit handling more physically demanding tasks.

Similarly, Tesla’s Optimus programme has moved out of the phase where concept videos were all that existed, and is now undergoing factory trials. Optimus robots are being tested on structured tasks like part handling and equipment transport inside Tesla’s automotive manufacturing facilities. While still limited in scope, these pilots demonstrate the pattern of humanoid-like machines chosen over less anthropomorphic form-factors so they can operate in human-designed and -populated spaces.

Inspection, maintenance, and hazardous environments

Industrial inspection is emerging as one of the earliest commercially viable use cases for humanoid and quasi-humanoid robots. Boston Dynamics’ Atlas, while not yet a general-purpose commercial product, has been used in live industrial trials for inspection and disaster-response environments. It can navigate uneven terrain, climb stairs, and manipulate tools in places considered unsafe for humans.

Toyota Research Institute has deployed humanoid robotics platforms for remote inspection and manipulation tasks in similar settings. Toyota’s systems rely on multimodal perception and human-in-the-loop control, the latter reinforcing an industry trend: early deployments prioritise reliability and traceability, so need human oversight.

Hexagon’s AEON aligns closely with this trend. Its emphasis on sensor fusion and spatial intelligence is relevant for inspection and quality assurance tasks, where precise understanding of physical environments is more valuable than the conversational abilities most associated with everyday use of AIs.

Cloud platforms central to robotics strategy

A defining feature of the Microsoft-Hexagon partnership is the use of cloud infrastructure in the scaling of humanoid robots. Training, updating, and monitoring physical AI systems generates large quantities of data, including video, force feedback from on-device sensors, spatial mapping (such as that derived from LIDAR), and operational telemetry. Managing this data locally has historically been a bottleneck, due to storage and processing constraints.

By using platforms like Azure and Azure IoT Operations, plus real-time intelligence services in the cloud, humanoid robots can be trained fleet-wide, not isolated units. This leads to multiple possibilities in shared learning, improvement by iteration, and greater consistency. For board-level buyers, these IT architecture shifts mean humanoid robots become viable entities that can be treated – in terms of IT requirements – more like enterprise software than machinery.

Labour shortages drive adoption

The demographic trends in manufacturing, logistics, and asset-intensive industries are increasingly unfavourable. Ageing workforces, declining interest in manual roles, and persistent skills shortages create skills gaps that conventional automation cannot fully address – at least, not without rebuilding entire facilities to be more suited to a robotic workforce. Fixed robotic systems excel in repetitive, predictable tasks but struggle in dynamic, human environments.

Humanoid robots occupy a middle ground. Not designed to replace workflows, they can stabilise operations where human availability is uncertain. Case studies show early value in night shifts, periods of peak demand, and tasks deemed too hazardous for humans.

What boards should evaluate before investing

For decision-makers considering investment in next-generation workplace robots, several issues to note have emerged from existing, real-world deployments:

Task specificity matters more than general intelligence, with the more successful pilots focusing on well-defined activities. Data governance and security continue to have to be placed front and centre when robots are put into play, especially so when it’s necessary to connect them to cloud platforms.

At a human level, workforce integration can be more challenging than sourcing, installing, and running the technology itself. Yet human oversight remains essential at this stage in AI maturity, for safety and regulatory acceptance.

A measured but irreversible shift

Humanoid robots won’t replace the human workforce, but an increasing body of evidence from live deployments and prototyping shows such devices are moving into the workplace. As of now, humanoid, AI-powered robots can perform economically-valuable tasks, and integration with existing industrial systems is immensely possible. For boards with the appetite to invest, the question could be when competitors might deploy the technology responsibly and at scale.

(Image source: Source: Hexagon Robotics)

 

Want to learn more about AI and big data from industry leaders? Check out AI & Big Data Expo taking place in Amsterdam, California, and London. The comprehensive event is part of TechEx and co-located with other leading technology events. Click here for more information.

AI News is powered by TechForge Media. Explore other upcoming enterprise technology events and webinars here.

The post From cloud to factory – humanoid robots coming to workplaces appeared first on AI News.

]]>
Roblox brings AI into the Studio to speed up game creation https://www.artificialintelligence-news.com/news/roblox-brings-ai-into-the-studio-to-speed-up-game-creation/ Wed, 17 Dec 2025 10:00:00 +0000 https://www.artificialintelligence-news.com/?p=111362 Roblox is often seen as a games platform, but its day-to-day reality looks closer to a production studio. Small teams release new experiences on a rolling basis and then monetise them at scale. That pace creates two persistent problems: time lost to repeatable production work, and friction when moving outputs between tools. Roblox’s 2025 updates […]

The post Roblox brings AI into the Studio to speed up game creation appeared first on AI News.

]]>
Roblox is often seen as a games platform, but its day-to-day reality looks closer to a production studio. Small teams release new experiences on a rolling basis and then monetise them at scale. That pace creates two persistent problems: time lost to repeatable production work, and friction when moving outputs between tools. Roblox’s 2025 updates point to how AI can reduce both, without drifting away from clear business outcomes.

Roblox keeps AI where the work happens

Rather than pushing creators toward separate AI products, Roblox has embedded AI inside Roblox Studio, the environment where creators already build, test, and iterate. In its September 2025 RDC update, Roblox outlined “AI tools and an Assistant” designed to improve creator productivity, with an emphasis on small teams. Its annual economic impact report adds that Studio features such as Avatar Auto-Setup and Assistant already include “new AI capabilities” to “accelerate content creation”.

The language matters—Roblox frames AI in terms of cycle time and output, not abstract claims about transformation or innovation. That framing makes it easier to judge whether the tools are doing their job.

One of the more practical updates focuses on asset creation. Roblox described an AI capability that goes beyond static generation, allowing creators to produce “fully functional objects” from a prompt. The initial rollout covers selected vehicle and weapons categories, returning interactive assets that can be extended inside Studio.

This addresses a common bottleneck where drafting an idea is rarely the slow part; turning it into something that behaves correctly inside a live system is. By narrowing that gap, Roblox reduces the time spent translating concepts into working components.

The company also highlighted language tools delivered through APIs, including Text-to-Speech, Speech-to-Text, and real-time voice chat translation across multiple languages. These features lower the effort required to localise content and reach broader audiences. Similar tooling plays a role in training and support in other industries.

Roblox treats AI as connective tissue between tools

Roblox also put emphasis on how tools connect to one another. Its RDC post describes integrating the Model Context Protocol (MCP) into Studio’s Assistant, allowing creators to coordinate multi-step work across third-party tools that support MCP. Roblox points to practical examples, such as designing a UI in Figma or generating a skybox elsewhere, then importing the result directly into Studio.

This matters because many AI initiatives slow down at the workflow level. Teams spend time copying outputs, fixing formats, or reworking assets that do not quite fit. Orchestration reduces that overhead by turning AI into a bridge between tools, rather than another destination in the process.

Linking productivity to revenue

Roblox ties these workflow gains directly to economics. In its RDC post, the company reported that creators earned over $1 billion through its Developer Exchange programme over the past year, and it set a goal for 10% of gaming content revenue to flow through its ecosystem. It also announced an increased exchange rate so creators “earn 8.5% more” when converting Robux into cash.

The economic impact report makes the connection explicit. Alongside AI upgrades in Studio, Roblox highlights monetisation tools such as price optimisation and regional pricing. Even outside a marketplace model, the takeaway is clear: when AI productivity is paired with a financial lever, teams are more likely to treat new tooling as part of core operations rather than an experiment.

Roblox uses operational AI to scale safety systems

While creative tools attract attention, operational AI often determines whether growth is sustainable. In November 2025, Roblox published a technical post on its PII Classifier, an AI model used to detect attempts to share personal information in chat. Roblox reports handling an average of 6.1 billion chat messages per day, and says the classifier has been in production since late 2024, with a reported 98% recall on an internal test set at a 1% false positive rate.

This is a quieter form of efficiency. Automation at this level reduces the need for manual review and supports consistent policy enforcement, which helps prevent scale from becoming a liability.

What carries across, and what several patterns stand out:

  • Put AI where decisions are already made. Roblox focuses on the build-and-review loop, rather than inserting a separate AI step.
  • Reduce tool friction early. Orchestration matters because it cuts down on context switching and rework.
  • Tie AI to something measurable. Creation speed is linked to monetisation and payout incentives.
  • Keep adapting the system. Roblox describes ongoing updates to address new adversarial behaviour in safety models.

Roblox’s tools will not translate directly to every sector. The underlying approach will. AI tends to pay for itself when it shortens the path from intent to usable output, and when that output is clearly connected to real economic value.

(Photo by Oberon Copeland @veryinformed.com)

See also: Mining business learnings for AI deployment

Want to learn more about AI and big data from industry leaders? Check out AI & Big Data Expo taking place in Amsterdam, California, and London. The comprehensive event is part of TechEx and is co-located with other leading technology events, click here for more information.

AI News is powered by TechForge Media. Explore other upcoming enterprise technology events and webinars here.

The post Roblox brings AI into the Studio to speed up game creation appeared first on AI News.

]]>
Baidu ERNIE multimodal AI beats GPT and Gemini in benchmarks https://www.artificialintelligence-news.com/news/baidu-ernie-multimodal-ai-gpt-and-gemini-benchmarks/ Wed, 12 Nov 2025 16:09:44 +0000 https://www.artificialintelligence-news.com/?p=110526 Baidu’s latest ERNIE model, a super-efficient multimodal AI, is beating GPT and Gemini on key benchmarks and targets enterprise data often ignored by text-focused models. For many businesses, valuable insights are locked in engineering schematics, factory-floor video feeds, medical scans, and logistics dashboards. Baidu’s new model, ERNIE-4.5-VL-28B-A3B-Thinking, is designed to fill this gap. What’s interesting […]

The post Baidu ERNIE multimodal AI beats GPT and Gemini in benchmarks appeared first on AI News.

]]>
Baidu’s latest ERNIE model, a super-efficient multimodal AI, is beating GPT and Gemini on key benchmarks and targets enterprise data often ignored by text-focused models.

For many businesses, valuable insights are locked in engineering schematics, factory-floor video feeds, medical scans, and logistics dashboards. Baidu’s new model, ERNIE-4.5-VL-28B-A3B-Thinking, is designed to fill this gap.

What’s interesting to enterprise architects is not just its multimodal capability, but its architecture. It’s described as a “lightweight” model, activating only three billion parameters during operation. This approach targets the high inference costs that often stall AI-scaling projects. Baidu is betting on efficiency as a path to adoption, training the system as a foundation for “multimodal agents” that can reason and act, not just perceive.

Complex visual data analysis capabilities supported by AI benchmarks

Baidu’s multimodal ERNIE AI model excels at handling dense, non-text data. For example, it can interpret a “Peak Time Reminder” chart to find optimal visiting hours, a task that reflects the resource-scheduling challenges in logistics or retail.

ERNIE 4.5 also shows capability in technical domains, like solving a bridge circuit diagram by applying Ohm’s and Kirchhoff’s laws. For R&D and engineering arms, a future assistant could validate designs or explain complex schematics to new hires.

This capability is supported by Baidu’s benchmarks, which show ERNIE-4.5-VL-28B-A3B-Thinking outperforming competitors like GPT-5-High and Gemini 2.5 Pro on some key tests:

  • MathVista: ERNIE (82.5) vs Gemini (82.3) and GPT (81.3)
  • ChartQA: ERNIE (87.1) vs Gemini (76.3) and GPT (78.2)
  • VLMs Are Blind: ERNIE (77.3) vs Gemini (76.5) and GPT (69.6)

It’s worth noting, of course, that AI benchmarks provide a guide but can be flawed. Always perform internal tests for your needs before deploying any AI model for mission-critical applications.

Baidu shifts from perception to automation with its latest ERNIE AI model

The primary hurdle for enterprise AI is moving from perception (“what is this?”) to automation (“what now?”). ERNIE 4.5 claims to address this by integrating visual grounding with tool use.

Asking the multimodal AI to find all people wearing suits in an image and return their coordinates in JSON format works. The model generates the structured data, a function easily transferable to a production line for visual inspection or to a system auditing site images for safety compliance.

The model also manages external tools and can autonomously zoom in on a photograph to read small text. If it faces an unknown object, it can trigger an image search to identify it. This represents a less passive form of AI that could power an agent to not only flag a data centre error, but also zoom in on the code, search the internal knowledge base, and suggest the fix.

Unlocking business intelligence with multimodal AI

Baidu’s latest ERNIE AI model also targets corporate video archives from training sessions and meetings to security footage. It can extract all on-screen subtitles and map them to their precise timestamps.

It also demonstrates temporal awareness, finding specific scenes (like those “filmed on a bridge”) by analysing visual cues. The clear end-goal is making vast video libraries searchable, allowing an employee to find the exact moment a specific topic was discussed in a two-hour webinar they may have dozed off a couple of times during.

Baidu provides deployment guidance for several paths, including transformers, vLLM, and FastDeploy. However, the hardware requirements are a major barrier. A single-card deployment needs 80GB of GPU memory. This is not a tool for casual experimentation, but for organisations with existing and high-performance AI infrastructure.

For those with the hardware, Baidu’s ERNIEKit toolkit allows fine-tuning on proprietary data; a necessity for most high-value use cases. Baidu is providing its latest ERNIE AI model with an Apache 2.0 licence that permits commercial use, which is essential for adoption.

The market is finally moving toward multimodal AI that can see, read, and act within a specific business context, and the benchmarks suggest it’s doing so with impressive capability. The immediate task is to identify high-value visual reasoning jobs within your own operation and weigh them against the substantial hardware and governance costs.

See also: Wiz: Security lapses emerge amid the global AI race

Banner for AI & Big Data Expo by TechEx events.

Want to learn more about AI and big data from industry leaders? Check out AI & Big Data Expo taking place in Amsterdam, California, and London. The comprehensive event is part of TechEx and is co-located with other leading technology events including the Cyber Security Expo. Click here for more information.

AI News is powered by TechForge Media. Explore other upcoming enterprise technology events and webinars here.

The post Baidu ERNIE multimodal AI beats GPT and Gemini in benchmarks appeared first on AI News.

]]>
Meta and Oracle choose NVIDIA Spectrum-X for AI data centres https://www.artificialintelligence-news.com/news/meta-and-oracle-choose-nvidia-spectrum-x-for-ai-data-centres/ Mon, 13 Oct 2025 15:00:00 +0000 https://www.artificialintelligence-news.com/?p=109846 Meta and Oracle are upgrading their AI data centres with NVIDIA’s Spectrum-X Ethernet networking switches — technology built to handle the growing demands of large-scale AI systems. Both companies are adopting Spectrum-X as part of an open networking framework designed to improve AI training efficiency and accelerate deployment across massive compute clusters. Jensen Huang, NVIDIA’s […]

The post Meta and Oracle choose NVIDIA Spectrum-X for AI data centres appeared first on AI News.

]]>
Meta and Oracle are upgrading their AI data centres with NVIDIA’s Spectrum-X Ethernet networking switches — technology built to handle the growing demands of large-scale AI systems. Both companies are adopting Spectrum-X as part of an open networking framework designed to improve AI training efficiency and accelerate deployment across massive compute clusters.

Jensen Huang, NVIDIA’s founder and CEO, said trillion-parameter models are transforming data centres into “giga-scale AI factories,” adding that Spectrum-X acts as the “nervous system” connecting millions of GPUs to train the largest models ever built.

Oracle plans to use Spectrum-X Ethernet with its Vera Rubin architecture to build large-scale AI factories. Mahesh Thiagarajan, Oracle Cloud Infrastructure’s executive vice president, said the new setup will allow the company to connect millions of GPUs more efficiently, helping customers train and deploy new AI models faster.

Meta, meanwhile, is expanding its AI infrastructure by integrating Spectrum Ethernet switches into the Facebook Open Switching System (FBOSS), its in-house platform for managing network switches at scale. According to Gaya Nagarajan, Meta’s vice president of networking engineering, the company’s next-generation network must be open and efficient to support ever-larger AI models and deliver services to billions of users.

Building flexible AI systems

According to Joe DeLaere, who leads NVIDIA’s Accelerated Computing Solution Portfolio for Data Centre, flexibility is key as data centres grow more complex. He explained that NVIDIA’s MGX system offers a modular, building-block design that lets partners combine different CPUs, GPUs, storage, and networking components as needed.

The system also promotes interoperability, allowing organisations to use the same design across multiple generations of hardware. “It offers flexibility, faster time to market, and future readiness,” DeLaere said to the media.

As AI models become larger, power efficiency has become a central challenge for data centres. DeLaere said NVIDIA is working “from chip to grid” to improve energy use and scalability, collaborating closely with power and cooling vendors to maximise performance per watt.

One example is the shift to 800-volt DC power delivery, which reduces heat loss and improves efficiency. The company is also introducing power-smoothing technology to reduce spikes on the electrical grid — an approach that can cut maximum power needs by up to 30 per cent, allowing more compute capacity within the same footprint.

Scaling up, out, and across

NVIDIA’s MGX system also plays a role in how data centres are scaled. Gilad Shainer, the company’s senior vice president of networking, told the media that MGX racks host both compute and switching components, supporting NVLink for scale-up connectivity and Spectrum-X Ethernet for scale-out growth.

He added that MGX can connect multiple AI data centres together as a unified system — what companies like Meta need to support massive distributed AI training operations. Depending on distance, they can link sites through dark fibre or additional MGX-based switches, enabling high-speed connections across regions.

Meta’s AI adoption of Spectrum-X reflects the growing importance of open networking. Shainer said the company will use FBOSS as its network operating system but noted that Spectrum-X supports several others, including Cumulus, SONiC, and Cisco’s NOS through partnerships. This flexibility allows hyperscalers and enterprises to standardise their infrastructure using the systems that best fit their environments.

Expanding the AI ecosystem

NVIDIA sees Spectrum-X as a way to make AI infrastructure more efficient and accessible across different scales. Shainer said the Ethernet platform was designed specifically for AI workloads like training and inference, offering up to 95 percent effective bandwidth and outperforming traditional Ethernet by a wide margin.

He added that NVIDIA’s partnerships with companies such as Cisco, xAI, Meta, and Oracle Cloud Infrastructure are helping to bring Spectrum-X to a broader range of environments — from hyperscalers to enterprises.

Preparing for Vera Rubin and beyond

DeLaere said NVIDIA’s upcoming Vera Rubin architecture is expected to be commercially available in the second half of 2026, with the Rubin CPX product arriving by year’s end. Both will work alongside Spectrum-X networking and MGX systems to support the next generation of AI factories.

He also clarified that Spectrum-X and XGS share the same core hardware but use different algorithms for varying distances — Spectrum-X for inside data centres and XGS for inter–data centre communication. This approach minimises latency and allows multiple sites to operate together as a single large AI supercomputer.

Collaborating across the power chain

To support the 800-volt DC transition, NVIDIA is working with partners from chip level to grid. The company is collaborating with Onsemi and Infineon on power components, with Delta, Flex, and Lite-On at the rack level, and with Schneider Electric and Siemens on data centre designs. A technical white paper detailing this approach will be released at the OCP Summit.

DeLaere described this as a “holistic design from silicon to power delivery,” ensuring all systems work seamlessly together in high-density AI environments that companies like Meta and Oracle operate.

Performance advantages for hyperscalers

Spectrum-X Ethernet was built specifically for distributed computing and AI workloads. Shainer said it offers adaptive routing and telemetry-based congestion control to eliminate network hotspots and deliver stable performance. These features enable higher training and inference speeds while allowing multiple workloads to run simultaneously without interference.

He added that Spectrum-X is the only Ethernet technology proven to scale at extreme levels, helping organisations get the best performance and return on their GPU investments. For hyperscalers such as Meta, that scalability helps manage growing AI training demands and keep infrastructure efficient.

Hardware and software working together

While NVIDIA’s focus is often on hardware, DeLaere said software optimisation is equally important. The company continues to improve performance through co-design — aligning hardware and software development to maximise efficiency for AI systems.

NVIDIA is investing in FP4 kernels, frameworks such as Dynamo and TensorRT-LLM, and algorithms like speculative decoding to improve throughput and AI model performance. These updates, he said, ensure that systems like Blackwell continue to deliver better results over time for hyperscalers such as Meta that rely on consistent AI performance.

Networking for the trillion-parameter era

The Spectrum-X platform — which includes Ethernet switches and SuperNICs — is NVIDIA’s first Ethernet system purpose-built for AI workloads. It’s designed to link millions of GPUs efficiently while maintaining predictable performance across AI data centres.

With congestion-control technology achieving up to 95 per cent data throughput, Spectrum-X marks a major leap over standard Ethernet, which typically reaches only about 60 per cent due to flow collisions. Its XGS technology also supports long-distance AI data centre links, connecting facilities across regions into unified “AI super factories.”

By tying together NVIDIA’s full stack — GPUs, CPUs, NVLink, and software — Spectrum-X provides the consistent performance needed to support trillion-parameter models and the next wave of generative AI workloads.

(Photo by Nvidia)

See also: OpenAI and Nvidia plan $100B chip deal for AI future

Want to learn more about AI and big data from industry leaders? Check out AI & Big Data Expo taking place in Amsterdam, California, and London. The comprehensive event is part of TechEx and is co-located with other leading technology events, click here for more information.

AI News is powered by TechForge Media. Explore other upcoming enterprise technology events and webinars here.

The post Meta and Oracle choose NVIDIA Spectrum-X for AI data centres appeared first on AI News.

]]>
SoundHound is giving its AI the power of sight https://www.artificialintelligence-news.com/news/soundhound-is-giving-its-ai-the-power-of-sight/ Tue, 12 Aug 2025 10:06:54 +0000 https://www.artificialintelligence-news.com/?p=107329 SoundHound AI, already a major player in voice assistants, is now giving its technology a pair of eyes. Imagine driving past a landmark and, without pulling out your phone, asking your car, “What’s that building over there?” and getting an instant answer. That’s what SoundHound AI is building.  With the launch of Vision AI, SoundHound’s […]

The post SoundHound is giving its AI the power of sight appeared first on AI News.

]]>
SoundHound AI, already a major player in voice assistants, is now giving its technology a pair of eyes.

Imagine driving past a landmark and, without pulling out your phone, asking your car, “What’s that building over there?” and getting an instant answer. That’s what SoundHound AI is building. 

With the launch of Vision AI, SoundHound’s new system combines sight with sound to create a much smarter and more natural way to interact with technology. The idea is to mimic how we as humans operate; we don’t just listen to someone, we also see their gestures and what they’re looking at.

By bringing this same contextual understanding to AI, SoundHound hopes to smooth over the clunky and often frustrating experience we have with many of today’s smart devices. The company is targeting real-world applications where this combined sense could make a huge difference, whether that’s in your next car, at the restaurant drive-thru, or a factory floor.

Keyvan Mohajer, CEO of SoundHound AI, said: “At SoundHound, we believe the future of AI isn’t just multimodal—it’s deeply integrated, responsive, and built for real-world impact.

“With Vision AI, we’re extending our leadership in voice and conversational AI to redefine how humans interact with products and services offered and used by businesses.”

So, how does it work? Vision AI takes a live feed from a camera and fuses it with the company’s voice technology, which already excels at understanding natural speech. By processing what it sees and what it hears at the exact same time, the system can grasp the user’s true intent in a way a simple voice assistant never could.

Think of a mechanic wearing smart glasses who can simply look at an engine part and ask for instructions, receiving instant visual and audio guidance without ever putting down their tools. In a shop, a staff member could scan shelves just by looking at them to get a real-time inventory count. For the rest of us, it might mean a drive-thru kiosk that visually confirms our order on screen the moment we say it.

One of the biggest technical problems in creating such a system is ensuring the audio and visual elements are perfectly synchronised. Any lag would shatter the illusion of a natural conversation.

Pranav Singh, VP of Engineering at SoundHound AI, commented: “With Vision AI, we are fusing visual recognition and conversational intelligence into a single, synchronised flow. Every frame, every utterance, every intent is interpreted within the same ecosystem—ensuring faster, more natural user experiences that scale across surfaces from kiosks to embedded devices.

“This is innovation at the intersection of intelligence and execution, delivering AI that sees what you see, hears what you say, and responds in the moment.”

For the businesses adopting this tech, the promise is to provide faster service, fewer mistakes, and happier customers. It’s about removing friction and making technology feel less like a tool you have to operate and more like a partner that helps you get things done.

This new visual capability isn’t the only upgrade SoundHound is rolling out. The company also recently improved the “brain” of its system with a new update, Amelia 7.1. This enhancement makes its AI agents faster, more accurate, and gives businesses more control and transparency over how they work.

By combining sight and sound, SoundHound is aiming to push us closer to a world where interacting with AI feels as easy and intuitive as talking to another person.

(Photo by Christian Lue)

See also: Alan Turing Institute: Humanities are key to the future of AI

Want to learn more about AI and big data from industry leaders? Check out AI & Big Data Expo taking place in Amsterdam, California, and London. The comprehensive event is co-located with other leading events including Intelligent Automation Conference, BlockX, Digital Transformation Week, and Cyber Security & Cloud Expo.

Explore other upcoming enterprise technology events and webinars powered by TechForge here.

The post SoundHound is giving its AI the power of sight appeared first on AI News.

]]>
Inside Tim Cook’s push to get Apple back in the AI race https://www.artificialintelligence-news.com/news/inside-tim-cook-push-to-get-apple-back-in-the-ai-race/ Wed, 06 Aug 2025 09:21:51 +0000 https://www.artificialintelligence-news.com/?p=107290 While other tech companies push out AI tools at full speed, Apple is taking its time. Its Apple Intelligence features – shown off at WWDC – won’t reach most users until at least 2025 or even 2026. Some see this as Apple falling behind, but the company’s track record suggests it prefers to launch only […]

The post Inside Tim Cook’s push to get Apple back in the AI race appeared first on AI News.

]]>
While other tech companies push out AI tools at full speed, Apple is taking its time. Its Apple Intelligence features – shown off at WWDC – won’t reach most users until at least 2025 or even 2026. Some see this as Apple falling behind, but the company’s track record suggests it prefers to launch only when products are ready.

In contrast, competitors like Microsoft, OpenAI, and Google have already shipped AI features widely – often with bugs and unreliable results, and usually whether or not users ask for them. AI assistants today still struggle with accuracy, consistency, and usefulness in many tasks.

Apple seems to be watching from the sidelines, waiting for the tech to mature. Instead of flooding iOS with half-working tools, it’s holding back. That strategy may pay off if users lose patience with AI that overpromises and underdelivers.

Apple has done this before – launching smartwatches and tablets late, but with stronger products. And since it already owns the hardware and software, and controls its own app store, it can afford to wait.

If current AI tools don’t improve soon, Apple’s slower, more cautious rollout might look less like hesitation and more like smart planning.

That measured approach doesn’t mean Apple is sitting still. Behind the scenes, the company is ramping up investment, hiring, and internal coordination to prepare for an AI shift. That strategy was on full display during a recent all-hands meeting at Apple’s headquarters, where CEO Tim Cook rallied employees and laid out the company’s AI ambitions.

Apple is getting serious about artificial intelligence, and Cook wants everyone at the company on board. As reported by Bloomberg, during a rare all-company gathering at its Cupertino HQ, he spoke directly to employees about what’s next. His message was clear: Apple has to win in AI – and now is the time to make that happen.

Cook called AI a once-in-a-generation shift, comparing its impact to that of the internet, smartphones, and cloud computing. “Apple must do this. Apple will do this. This is sort of ours to grab,” he said, according to people who were there. He promised Apple would spend what it takes to compete.

The company has been slower than others to roll out AI tools. Apple Intelligence – its main AI offering – was introduced long after companies like OpenAI, Google, and Microsoft launched its own products. And even when Apple finally announced its plans, the reaction was underwhelming.

See also: Why Apple is playing it slow with AI

But Cook pointed out that Apple has often shown up late to new technology – only to redefine it. “There was a PC before the Mac; there was a smartphone before the iPhone,” he reminded employees. “There were many tablets before the iPad.” Apple didn’t invent those categories, he said, it just made them work better.

Building the future of Siri

Much of the company’s current AI work centres on Siri, its voice assistant. Apple had originally planned a major overhaul as part of Apple Intelligence, adding features powered by large language models. But that rollout was delayed, leading to internal shakeups and a rethink of the entire system.

Craig Federighi, Apple’s software chief, told employees that trying to merge old and new versions of Siri didn’t work. The team tried to keep the original system for basic tasks like setting timers, while adding generative AI features for more complex requests. But that hybrid setup didn’t meet Apple’s standards. “We realised that approach wasn’t going to get us to Apple quality,” he said.

Now, the team is rebuilding Siri from the ground up. A completely new version is in the works, expected as early as spring 2026. Federighi said the results so far have been strong and could lead to more improvements than originally planned. “There is no project people are taking more seriously,” he told staff.

A key figure behind this new direction is Mike Rockwell, the executive who led development on Apple’s Vision Pro headset. Rockwell and his software team are now leading Siri’s redesign. Federighi said they’ve “supercharged” the work and brought a new level of focus.

Investing in AI talent and tools

Apple is also expanding its AI team quickly. Cook said the company hired 12,000 people in the past year, with 40% of them joining research and development, with many of those hires are focused on AI.

Part of the work involves hardware. Apple is building new chips specifically designed for AI, including a more powerful server chip known internally as “Baltra.” The company is also opening an AI server farm in Houston to support future projects.

Beyond Siri, Apple is quietly building what could become a major AI tool. According to Bloomberg‘s Mark Gurman, Apple has formed a team called “Answers, Knowledge, and Information” (AKI). The group’s job is to create search that works more like ChatGPT – giving direct answers rather than just showing links.

The AKI team is led by Robby Walker, who reports to AI chief John Giannandrea, and Apple has already started hiring engineers for the group. While details are still limited, the project appears to include backend systems, search algorithms, and potentially even a standalone app.

A push to move faster

Cook also encouraged employees to start using AI more in their work. “All of us are using AI in a significant way already, and we must use it as a company as well,” he said. He told employees to bring ideas to their managers and find ways to get AI tools into products faster.

The sense of urgency was echoed during Apple’s recent earnings call. The company posted strong results, with nearly 10% growth in the June quarter – enough to ease concerns about slowing iPhone sales and weak results from the Chinese market. Cook told investors Apple would “significantly” increase its spending on AI.

Yet challenges remain. Apple expects to face a $1.1 billion hit from tariffs this quarter and continues to deal with antitrust pressures in the US and Europe, where regulators are watching closely to see how the company runs its App Store and handles user data.

Cook acknowledged these issues at the staff meeting, saying Apple would continue pushing regulators to adopt rules that don’t hurt privacy or user experience. “We need to continue to push on the intention of the regulation,” he said, “instead of these things that destroy the user experience and user privacy and security.”

New stores, new markets

Beyond AI, Cook touched on Apple’s retail strategy. The company plans to open new stores in emerging markets, including India, the United Arab Emirates, and China. A store in Saudi Arabia is also on the way. Apple is also putting more focus on its online store.

“We need to be in more countries,” Cook said, adding that most of Apple’s future growth will come from new markets. That doesn’t mean existing regions will be ignored, but the company sees more opportunity in expanding its global footprint.

What’s next for Apple products

While Cook didn’t reveal any product details, he said, “I have never felt so much excitement and so much energy before as right now.”

Reports suggest Apple is working on several new devices, including a foldable iPhone, new smart glasses, updated home devices, and robotics. A major iPhone redesign is also rumoured for its 20th anniversary next year.

Cook didn’t confirm any of this, but he hinted at big things ahead. “The product pipeline, which I can’t talk about: It’s amazing, guys. It’s amazing,” he said. “Some of it you’ll see soon, some of it will come later, but there’s a lot to see.”

Cautious but confident

Apple’s cautious approach to AI may have slowed it down, but internally, the company seems to believe that slow and steady might win the race. Cook’s message to employees was clear: Apple can still define what useful, responsible AI looks like – and it’s all hands on deck to get there.

(Photo by: Apple via YouTube)

Want to learn more about AI and big data from industry leaders? Check out AI & Big Data Expo taking place in Amsterdam, California, and London. The comprehensive event is co-located with other leading events including Intelligent Automation Conference, BlockX, Digital Transformation Week, and Cyber Security & Cloud Expo.

Explore other upcoming enterprise technology events and webinars powered by TechForge here.

The post Inside Tim Cook’s push to get Apple back in the AI race appeared first on AI News.

]]>
Mistral AI gives Le Chat voice recognition and deep research tools https://www.artificialintelligence-news.com/news/mistral-ai-le-chat-voice-recognition-deep-research-tools/ Thu, 17 Jul 2025 15:50:40 +0000 https://www.artificialintelligence-news.com/?p=107122 Mistral AI has updated Le Chat with voice recognition, deep research tools, and other features to make the chatbot a more helpful assistant. The company believes that the best AI assistants should help you dive deeper into your thoughts and maintain the flow of conversation. As Mistral AI put it, chatbots are at their best […]

The post Mistral AI gives Le Chat voice recognition and deep research tools appeared first on AI News.

]]>
Mistral AI has updated Le Chat with voice recognition, deep research tools, and other features to make the chatbot a more helpful assistant.

The company believes that the best AI assistants should help you dive deeper into your thoughts and maintain the flow of conversation. As Mistral AI put it, chatbots are at their best when they “let you go deeper in your thinking, keep your conversation flowing, and maintain contextual continuity.”

A standout feature, albeit somewhat playing catch-up with rivals, is the ‘Deep Research’ mode. Think of it as turning Le Chat into your personal research assistant.

When you ask a complex question, the Deep Research tool breaks it down, finds credible sources, and then builds a structured report with references, making it easy to follow. Mistral designed it to feel like you’re working with a highly organised partner, helping you tackle everything from market trends to scientific topics.

If you prefer talking over typing, the new ‘Vocal’ mode is for you.

Powered by Mistral AI’s powerful new voice model called Voxtral, the Vocal mode allows for natural, low-latency conversations—meaning you can talk to Le Chat without awkward pauses. Mistral says it’s perfect for brainstorming ideas while on a walk, getting quick answers when your hands are full, or transcribing a meeting.

For those really complex Le Chat questions, ‘Think’ mode taps into Mistral AI’s reasoning model, Magistral, to provide clear and thoughtful answers.

One of the most impressive capabilities of Think mode is native multilingual ability. You can draft a proposal in Spanish, explore a legal concept in Japanese, or just think through an idea in whatever language feels most comfortable. Le Chat can even switch between languages mid-sentence.

To help you stay organised, the new ‘Projects’ feature lets you group related chats into focused folders. Each project remembers your settings and keeps all your conversations, uploaded files, and ideas in one tidy space. It could become the perfect area to manage everything from planning a house move to tracking a long-term work project.

Finally, in a partnership between Mistral AI and Black Forest Labs, Le Chat now includes advanced image editing. This means you can create an image and then fine-tune it with simple commands like “remove the object” or “place me in another city”.

All these new features are available today in Le Chat on the web or by downloading the mobile app.

See also: Military AI contracts awarded to Anthropic, OpenAI, Google, and xAI

Want to learn more about AI and big data from industry leaders? Check out AI & Big Data Expo taking place in Amsterdam, California, and London. The comprehensive event is co-located with other leading events including Intelligent Automation Conference, BlockX, Digital Transformation Week, and Cyber Security & Cloud Expo.

Explore other upcoming enterprise technology events and webinars powered by TechForge here.

The post Mistral AI gives Le Chat voice recognition and deep research tools appeared first on AI News.

]]>
Details leak of Jony Ive’s ambitious OpenAI device https://www.artificialintelligence-news.com/news/details-leak-jony-ive-ambitious-openai-device/ Thu, 22 May 2025 16:35:41 +0000 https://www.artificialintelligence-news.com/?p=106524 After what felt like an age of tech industry tea-leaf reading, OpenAI has officially snapped up “io,” the much-buzzed-about startup building an AI device from former Apple design guru Jony Ive and OpenAI’s chief, Sam Altman. The price tag? $6.5 billion. OpenAI put out a video this week talking about the Ive and Altman venture […]

The post Details leak of Jony Ive’s ambitious OpenAI device appeared first on AI News.

]]>
After what felt like an age of tech industry tea-leaf reading, OpenAI has officially snapped up “io,” the much-buzzed-about startup building an AI device from former Apple design guru Jony Ive and OpenAI’s chief, Sam Altman. The price tag? $6.5 billion.

OpenAI put out a video this week talking about the Ive and Altman venture in a general sort of way, but now, a few more tidbits about what they’re actually cooking have slipped out.

And what are they planning with all that cash and brainpower? Well, the eagle-eyed folks at The Washington Post spotted an internal chat between Sam Altman and OpenAI staff where he set a target of shipping 100 million AI “companions.”

Altman allegedly even told his team the OpenAI device is “the chance to do the biggest thing we’ve ever done as a company here.”

To be clear, Altman has set that 100 million number as an eventual target. “We’re not going to ship 100 million devices literally on day one,” he said. But then, in a flex that’s pure Silicon Valley, he added they’d hit that 100 million mark “faster than any company has ever shipped 100 million of something new before.”

So, what is this mysterious “companion”? The gadget is designed to be entirely aware of a user’s surroundings, and even their “life.” While they’ve mostly talked about a single device, Altman did let slip it might be more of a “family of devices.”

Jony Ive, as expected, dubbed it “a new design movement.” You can almost hear the minimalist manifesto being drafted.

Why the full-blown acquisition, though? Weren’t they just going to partner up? Originally, yes. The plan was for Ive’s startup to cook up the hardware and sell it, with OpenAI delivering the brains. But it seems the vision got bigger. This isn’t just another accessory, you see.

Altman stressed the device will be a “central facet of using OpenAI.” He even said, “We both got excited about the idea that, if you subscribed to ChatGPT, we should just mail you new computers, and you should use those.”

Frankly, they reckon our current tech – our trusty laptops, the websites we browse – just isn’t up to snuff for the kind of AI experiences they’re dreaming of. Altman was pretty blunt, saying current use of AI “is not the sci-fi dream of what AI could do to enable you in all the ways that I think the models are capable of.”

So, we know it’s not a smartphone. Altman’s also put the kibosh on it being a pair of glasses. And Jony Ive, well, he’s apparently not rushing to make another wearable, which makes sense given his design ethos.

The good news for the impatient among us (i.e., everyone in tech) is that this isn’t just vapourware. Ive’s team has an actual prototype. Altman’s even taken one home to “live with it”. As for when we might get our hands on one? Altman’s reportedly aiming for a late 2026 release.

Naturally, OpenAI is keeping the actual device under wraps, but you can always count on supply chain whispers for a few clues. The ever-reliable (well, usually!) Apple supply chain analyst Ming-Chi Kuo has thrown a few alleged design details into the ring via social media.

Kuo reckons it’ll be “slightly larger” than the Humane AI Pin, but that it will look “as compact and elegant as an iPod Shuffle.” And yes, like the Shuffle, Kuo says no screen.

According to Kuo, the device will chat with your phone and computer instead, using good old-fashioned microphones for your voice and cameras to see what’s going on around you. Interestingly, he suggests it’ll be worn around the neck, necklace-style, rather than clipped on like the AI Pin.

Kuo’s crystal ball points to mass production in 2027, but he wisely adds a pinch of salt, noting the final look and feel could still change.

So, the billion-dollar (well, £5.1 billion) question remains: will this OpenAI device be the next big thing, the gamechanger we’ve been waiting for? Or will it be another noble-but-failed attempt to break free from the smartphone’s iron grip, joining the likes of the AI Pin in the ‘great ideas that didn’t quite make it’ pile?

Altman, for one, is brimming with confidence. Having lived with the prototype, he’s gone on record saying he believes it will be “the coolest piece of technology that the world will have ever seen.”

See also: Linux Foundation: Slash costs, boost growth with open-source AI

Want to learn more about AI and big data from industry leaders? Check out AI & Big Data Expo taking place in Amsterdam, California, and London. The comprehensive event is co-located with other leading events including Intelligent Automation Conference, BlockX, Digital Transformation Week, and Cyber Security & Cloud Expo.

Explore other upcoming enterprise technology events and webinars powered by TechForge here.

The post Details leak of Jony Ive’s ambitious OpenAI device appeared first on AI News.

]]>
Deepgram Nova-3 Medical: AI speech model cuts healthcare transcription errors https://www.artificialintelligence-news.com/news/deepgram-nova-3-medical-ai-speech-model-healthcare-transcription-errors/ Tue, 04 Mar 2025 13:25:55 +0000 https://www.artificialintelligence-news.com/?p=104673 Deepgram has unveiled Nova-3 Medical, an AI speech-to-text (STT) model tailored for transcription in the demanding environment of healthcare. Designed to integrate seamlessly with existing clinical workflows, Nova-3 Medical aims to address the growing need for accurate and efficient transcription in the UK’s public NHS and private healthcare landscape. As electronic health records (EHRs), telemedicine, […]

The post Deepgram Nova-3 Medical: AI speech model cuts healthcare transcription errors appeared first on AI News.

]]>
Deepgram has unveiled Nova-3 Medical, an AI speech-to-text (STT) model tailored for transcription in the demanding environment of healthcare.

Designed to integrate seamlessly with existing clinical workflows, Nova-3 Medical aims to address the growing need for accurate and efficient transcription in the UK’s public NHS and private healthcare landscape.

As electronic health records (EHRs), telemedicine, and digital health platforms become increasingly prevalent, the demand for reliable AI-powered transcription has never been higher. However, traditional speech-to-text models often struggle with the complex and specialised vocabulary used in clinical settings, leading to errors and “hallucinations” that can compromise patient care.

Deepgram’s Nova-3 Medical is engineered to overcome these challenges. The model leverages advanced machine learning and specialised medical vocabulary training to accurately capture medical terms, acronyms, and clinical jargon—even in challenging audio conditions. This is particularly crucial in environments where healthcare professionals may move away from recording devices.

“Nova‑3 Medical represents a significant leap forward in our commitment to transforming clinical documentation through AI,” said Scott Stephenson, CEO of Deepgram. “By addressing the nuances of clinical language and offering unprecedented customisation, we are empowering developers to build products that improve patient care and operational efficiency.”

One of the key features of the model is its ability to deliver structured transcriptions that integrate seamlessly with clinical workflows and EHR systems, ensuring vital patient data is accurately organised and readily accessible. The model also offers flexible, self-service customisation, including Keyterm Prompting for up to 100 key terms, allowing developers to tailor the solution to the unique needs of various medical specialties.

Versatile deployment options – including on-premises and Virtual Private Cloud (VPC) configurations – ensure enterprise-grade security and HIPAA compliance, which is crucial for meeting UK data protection regulations.

“Speech-to-text for enterprise use cases is not trivial, and there is a fundamental difference between voice AI platforms designed for enterprise use cases vs entertainment use cases,” said Kevin Fredrick, Managing Partner at OneReach.ai. “Deepgram’s Nova-3 model and Nova-3-Medical model, are leading voice AI offerings, including TTS, in terms of the accuracy, latency, efficiency, and scalability required for enterprise use cases.”

Benchmarking Nova-3 Medical: Accuracy, speed, and efficiency

Deepgram has conducted benchmarking to demonstrate the performance of Nova-3 Medical. The model claims to deliver industry-leading transcription accuracy, optimising both overall word recognition and critical medical term accuracy.

  • Word Error Rate (WER): With a median WER of 3.45%, Nova-3 Medical outperforms competitors, achieving a 63.6% reduction in errors compared to the next best competitor. This enhanced precision minimises manual corrections and streamlines workflows.
  • Keyword Error Rate (KER): Crucially, Nova-3 Medical achieves a KER of 6.79%, marking a 40.35% reduction in errors compared to the next best competitor. This ensures that critical medical terms – such as drug names and conditions – are accurately transcribed, reducing the risk of miscommunication and patient safety issues.

In addition to accuracy, Nova-3 Medical excels in real-time applications. The model transcribes speech 5-40x faster than many alternative speech recognition vendors, making it ideal for telemedicine and digital health platforms. Its scalable architecture ensures high performance even as transcription volumes increase.

Furthermore, Nova-3 Medical is designed to be cost-effective. Starting at $0.0077 per minute of streaming audio – which Deepgram claims is more than twice as affordable as leading cloud providers – it allows healthcare tech companies to reinvest in innovation and accelerate product development.

Deepgram’s Nova-3 Medical aims to empower developers to build transformative medical transcription applications, driving exceptional outcomes across healthcare.

(Photo by Alexander Sinn)

See also: Autoscience Carl: The first AI scientist writing peer-reviewed papers

Want to learn more about AI and big data from industry leaders? Check out AI & Big Data Expo taking place in Amsterdam, California, and London. The comprehensive event is co-located with other leading events including Intelligent Automation Conference, BlockX, Digital Transformation Week, and Cyber Security & Cloud Expo.

Explore other upcoming enterprise technology events and webinars powered by TechForge here.

The post Deepgram Nova-3 Medical: AI speech model cuts healthcare transcription errors appeared first on AI News.

]]>
Top seven Voice of Customer (VoC) tools for 2025 https://www.artificialintelligence-news.com/news/top-seven-voice-of-customer-tools-for-2025/ Mon, 03 Mar 2025 09:32:11 +0000 https://www.artificialintelligence-news.com/?p=104689 One of the powerful methods for enhancing customer experiences and building lasting relationships is through Voice of Customer (VoC) tools. These tools allow businesses to gather insights directly from their customers, helping them to improve services, products, and overall customer satisfaction. What are voice of customer (VoC) tools? VoC tools are specialised software applications designed […]

The post Top seven Voice of Customer (VoC) tools for 2025 appeared first on AI News.

]]>
One of the powerful methods for enhancing customer experiences and building lasting relationships is through Voice of Customer (VoC) tools. These tools allow businesses to gather insights directly from their customers, helping them to improve services, products, and overall customer satisfaction.

What are voice of customer (VoC) tools?

VoC tools are specialised software applications designed to collect, analyse, and interpret customer feedback. Feedback can come from various sources, including surveys, social media, direct customer interactions, and product reviews. The primary goal of the tools is to build a comprehensive understanding of customer sentiment, pain points, and preferences.

VoC tools let organisations gather qualitative and quantitative data, translating the voice of their customers into actionable insights. By implementing these tools, businesses can achieve a deeper understanding of their customers, leading to informed decision-making and ultimately, enhanced customer loyalty.

Top 7 Voice of Customer (VoC) tools for 2025

Here are the top seven VoC tools to consider in 2025, each offering unique features and functions to help you capture the voice of your customers effectively:

1. Revuze

Revuze is an AI-driven VoC tool that focuses on extracting actionable insights from customer feedback, reviews, and surveys.

Key features:

  • Natural language processing to analyse open-ended responses.
  • Comprehensive reporting dashboards that highlight key themes.
  • The ability to benchmark against competitors.

Benefits: Revuze empowers businesses to turn large amounts of feedback into strategic insights, enhancing decision-making and customer engagement.

2. Satisfactory

Satisfactory is a user-friendly VoC tool that emphasises customer feedback collection through satisfaction surveys and interactive forms.

Key features:

  • Simple survey creation with customisable templates.
  • Live feedback tracking and reporting.
  • Integration with popular CRM systems like Salesforce.

Benefits: Satisfactory helps businesses quickly gather customer feedback, allowing for immediate action to improve customer satisfaction and experience.

3. GetFeedback

GetFeedback offers a streamlined platform for creating surveys and collecting customer insights, designed for usability across various industries.

Key features:

  • Easy drag-and-drop survey builder.
  • Real-time feedback collection via multiple channels.
  • Integration capabilities with other tools like Salesforce and HubSpot.

Benefits: GeTFEEDBACK provides actionable insights while ensuring an engaging experience for customers participating in surveys.

4. Chattermill

Chattermill focuses on analysing customer feedback through sophisticated AI and machine learning algorithms, turning unstructured data into actionable insights.

Key features:

  • Customer sentiment analysis across multiple data sources.
  • Automated reporting tools and dashboards.
  • Customisable alerts for key metrics and issues.

Benefits: Chattermill enables businesses to react quickly to customer feedback, enhancing their responsiveness and improving overall service quality.

5. Skeepers

Skeepers is designed for brands looking to amplify the customer voice by combining feedback gathering and brand advocacy functions.

Key features:

  • Comprehensive review management system.
  • Real-time customer jury feedback for products.
  • Customer advocacy programme integration.

Benefits: Skeepers helps brands transform customer insights into powerful endorsements, boosting brand reputation and fostering trust.

6. Medallia

Medallia is an established leader in the VoC space, providing an extensive platform for capturing feedback from various touchpoints throughout the customer journey.

Key features:

  • Robust analytics capabilities and AI-driven insights.
  • Multi-channel feedback collection, including mobile, web, and in-store.
  • Integration with existing systems for data flow.

Benefits: Medallia’s comprehensive suite offers valuable tools for organisations aiming to transform customer feedback into strategic opportunities.

7. InMoment

InMoment combines customer feedback across all channels, providing organisations with insights to enhance customer experience consistently.

Key features:

  • AI-powered analytics for deep insights and trends.
  • Multi-channel capabilities for collecting feedback.
  • Advanced reporting and visualisation tools.

Benefits: With InMoment, businesses can create a holistic view of the customer experience, driving improvements across the organisation.

Benefits of using VoC tools

  • Enhanced customer understanding: By capturing and analysing customer feedback, businesses gain insights into what customers truly want, their pain points, and overall satisfaction levels.
  • Improvement of products and services: VoC tools help organisations identify specific areas where products or services can be improved based on customer feedback, leading to increased satisfaction and loyalty.
  • Informed decision making: With access to real-time customer insights, organisations can make data-driven decisions, ensuring that strategies align with customer preferences.
  • Increased customer loyalty: When customers feel heard and valued, they are more likely to remain loyal to a brand, leading to repeat business and long-term growth.
  • Competitive advantage: Organisations that effectively use customer feedback can stay ahead of competitors by quickly adapting to market demands and trends.
  • Proactive issue resolution: VoC tools enable businesses to identify customer complaints early, allowing them to address issues proactively and improve overall customer satisfaction.
  • Enhanced employee engagement: A deep understanding of customer needs can help employees deliver better service, enhancing their engagement and job satisfaction.

How to choose VoC tools

Choosing the right VoC tool involves several considerations:

  • Define your goals: Before researching tools, clearly define what you want to achieve with VoC. Whether it’s improving product features, enhancing customer service, or understanding market trends, outlining your goals will help narrow your choices.
  • Assess your budget: VoC tools come with various pricing models. Determine your budget and evaluate the tools that provide the best value for your investment.
  • Evaluate features: Based on your goals, assess the features of each tool. Prioritise the features that align with your needs, like sentiment analysis, real-time reporting, or integration capabilities.
  • Check integration options: Ensure that the chosen VoC tool can easily integrate with your existing systems. Integration can save time and enhance the overall efficiency of data utilisation.
  • Look for scalability: As your business grows, your VoC needs may change. Choose a tool that can scale with your business and adapt to evolving customer insight demands.
  • Request demos and trials: Take advantage of free trials or request demos to see how the tools function in real-time. The experience can provide valuable information about usability and effectiveness.
  • Read reviews and case studies: Researching customer reviews, testimonials, and case studies can give you insights into how well the tool performs and its impact on businesses similar to yours.

The post Top seven Voice of Customer (VoC) tools for 2025 appeared first on AI News.

]]>