BETA
This is a BETA experience. You may opt-out by clicking here

More From Forbes

Edit Story

Why 2024 Is The Year Of Vision With Apple Vision Pro, AI Wearables And LVMs

Following

The future of tech is wearable, AI-powered and spatially aware.

If 2023 was the year of Large Language Models with the rise of Open AI’s ChatGPT amassing millions of users in a record-setting few short months, then all signals are pointing toward 2024 being the year that Large Vision Models (LVMs) will be unlocked; AI-driven Spatial Computing will be accessible to the mass market; and computer vision and wearable AI that can see the world will make significant strides.

Technology is accelerating at record speed and next year will be no exception. That’s why we see 2024 as the year of Vision.

The post-smartphone future my colleague and I once envisioned in is slowly taking shape. It’s a future where a new device, a spatial computer in the form of a wearable, overtakes the smartphone in everything from navigation to personal assistants and how we access information and experiences.

Meta’s Ray-Ban Smart Glasses recently went multimodal, and Amazon’s Echo frames which got a revamp this year while Humane launched its AI PIN. Also, Microsoft is adding its AI-copilot to its Microsoft Hololens 2; Google Gemini presented a video this year showcasing the seeing AI capabilities that Gemini could have in the future. Also, Google, Samsung and Qualcomm announced they are partnering on a mixed reality device expected to come in 2025.

OpenAI also has its sights set (pun intended) on a future device that can be used to engage with its models in new ways. A recent article in The Information mentioned that “OpenAI recently discussed embedding its object recognition software, GPT-4 with Vision, into products from Snapchat’s parent company, according to a person familiar with the situation. That could result in new features for Snap’s Spectacles smart glasses.”

With most of big tech’s players looking at what hardware might replace our computers initially and eventually our mobile phones, it’s not too far-fetched to say that the devices we are seeing in later 2023 and that we will see in 2024 are transitional devices that will continue to evolve and mature in this next decade and that start to garner more and more attention, and eventually adoption from consumers.

Let’s dive deeper into Computer Vision, Seeing Wearable AI, LVMs, and Apple’s spatial computer, the Apple Vision Pro.

Computer Vision And Seeing AI

Computer vision is a subset of artificial intelligence. In extremely simple terms, computer vision is what allows machines to “see”. Machines with computer vision are normally trained to recognize a specific use case, like inspecting a part on an assembly line.

Computer vision can analyze a product for defects more quickly than a human can. Computer vision is one of the essential parts of making wearables work and machines to see. However, for it to work for any number of use cases that an everyday person might come across, it needs to be combined with more AI. For instance, Meta says its work with AI and Ray-Ban, now that it has gone multimodal, will let the smart glasses see the world from the wearer’s perspective for the first time.

Computer vision and artificial intelligence converge in Spatial Computing. “Spatial Computing is a scale technology that gets its ‘eyes and ears’ from AI, Computer Vision, and ushers in the era of Large Vision Models (LVM).”

Below, let’s discuss Spatial Computing in more detail.

Large Vision Models

While only a few out there are talking about Large Vision Models yet, it is a topic of interest in Silicon Valley.

A recent LinkedIn post and video by renowned AI luminary, Andrew Ng’s highlighted LVMS as follows: “The LVM revolution is coming a little after the LLM one, and will transform how we process images. But there’s an important difference between LLMs and LVMs. Internet text is similar enough to proprietary text documents that an LLM trained on internet text can understand your documents, but internet images – such as Instagram pictures – contain a lot of pictures of people, pets, landmarks, and everyday objects. Many practical vision applications (manufacturing, aerial imagery, life sciences, etc.) use images that look nothing like most internet images. So a generic LVM trained on internet images fares poorly at picking out the most salient features of images in many specialized domains.”

The AR smart glasses we imaged are coming to life. In part, thanks to hardware design (more on that in the next section) but also thanks to AI and Large Vision Models (LVMs). LVMs recognize images. They can describe scenes, objects, and even emotions. LVMs are what smart glasses and other wearables will use to process visual data. LVMs use deep learning to detect patterns and connections within and between images and eventually videos.

In Meta’s preview of their latest AI-enabled Ray-Bans, the wearer asks how they should grill their food. The Large Vision Models are what enable the Ray-Bans (or other wearables) to process the image of the food on the grill, categorize it, and give a response. To get the most use out of our wearable devices, we need it to be able to process the visual world we live in. Large Vision Models have evolved to see our world (not without some hallucinations).

From an enterprise perspective, in Andrew Ng’s LinkedIn post and video mentioned above, he was joined by Dan Maloney from Landing AI, who went on to explain that they saw in their research that models adapted to images of a particular domain (such as semiconductor manufacturing, or pathology) tend to do much better. He went on to say, “At Landing AI, by using ~100K unlabeled images to adapt an LVM to a specific domain, we see significantly improved results, for example, where only 10-30% as much labeled data is now needed to achieve a certain level of performance.”

Ng continued by saying, “For companies with large sets of images that look nothing like internet images, I think domain-specific LVMs can be a way to unlock considerable value from their data.” So LVMs could prove to be of extreme value for the enterprise and in domain-specific use cases as well.

Apple Vision Pro, visionOS, And Spatial Computing

Competition for the future of wearable AI is already ripe for 2024. As mentioned, Apple, Meta, Amazon and Snap are all gearing up their smart glasses and mixed reality headsets to be your device of choice. Meta calls it a “platform shift”. One where AI will be the primary way humans interact with machines. We see it slightly differently. One where AI-enabled machines interact with humans, the way humans see the world. We will still see through the eyes of the machine, aka our smart glasses, but the AI in the glasses will interact with us to make sense of everything it and its human counterpart see.

Meta AI Ray-Ban glasses and Snap Spectacles with possible OpenAI integration are all products to keep an eye out for. But Apple’s Vision Pro is still what inspired us to write A Wearable World. Apple is already prepping users to be Vision Pro-ready with spatial video recording features on the iPhone 15. Apple is rumored to be training Apple Genius employees on the Vision Pro. It’s the one device powerful enough to immerse its wearer in a virtual rainforest or see a prototype of a product and virtually upgrade and test it. It’s a spatial computer that can see the world and engage with it in the some of the same ways you do.

Spatial Computing is an evolving 3D-centric form of computing that, at its core, uses AI, Computer Vision and extended reality to blend virtual experiences into the physical world that break free from screens and make all surfaces spatial interfaces. It allows humans, devices, computers, robots and virtual beings to navigate through computing in 3D space. It ushers in a new paradigm for human-to-human interaction as well as human-computer interaction, enhancing how we visualize, simulate, and interact with data in physical or virtual locations and expanding computing beyond the confines of the screen into everything you can see, experience, and know.

Spatial Computing allows us to navigate the world alongside robots, drones, cars, virtual assistants, and beyond, and is not limited to just one technology or just one device. It is a mix of software, hardware and information that allows humans and technology to connect in new ways ushering in a new form of computing that could be even more impactful than personal computing and mobile computing have been to society.

AI Wearables That Can See Our World

How we engage with each other and interact with technology will change when AI wearables become the default device.

But imagining a wearable world didn’t start with the announcement of Apple’s Vision Pro mixed reality headset. We first imagined a post-smartphone world in 2020 when we wrote A Day in AR Glasses. In the article, we imagined a woman named Katie who walked through her whole day, did her work, and visited friends - through her AR glasses. She interacted with AI holograms of her workplace maintenance and turned her lunch break into an art exhibit. While we mentioned artificial intelligence in our work, it didn’t take the main stage.

Generative AI and ChatGPT unleashed our imaginations in 2023. In 2024, our ideas will solidify. 2024 will be the year of vision. From computer vision to Large Vision Models, this is the year we will see through the eyes of the machine and wearable technology will become a lot more visible, interesting, and competitive. While text still reigns supreme, Vision in many forms, will change the technology landscape in exciting and unforeseen ways and will become usher in a new tech race. Are you prepared for the year where vision starts to take center stage?

Follow me on Twitter or LinkedInCheck out my website