Connect with us

Hi, what are you looking for?

Tech News

Microsoft brings out a small language model that can look at pictures

Illustration of the Microsoft wordmark on a green background
Illustration: The Verge

Microsoft announced a new version of its small language model, Phi-3, which can look at images and tell you what’s in them.

Phi-3-vision is a multimodal model — aka it can read both text and images — and is best used on mobile devices. Microsoft says Phi-3-vision, now available on preview, is a 4.2 billion parameter model (parameters refer to how complex a model is and how much of its training it understands) that can do general visual reasoning tasks like asking questions about charts or images.

But Phi-3-vision is far smaller than other image-focused AI models like OpenAI’s DALL-E or Stability AI’s Stable Diffusion. Unlike those models, Phi-3-vision doesn’t generate images, but it can understand what’s in an image and analyze it for a…

Continue reading…

You May Also Like

Tech News

Illustration: The Verge Canada’s security agency is trying to dissuade Canadians from using TikTok, telling users that their data is “available to the government...

Editor's Pick

On this week’s edition of StockCharts TV‘s StockCharts in Focus, Grayson walks you through the “heart and soul” of StockCharts – Your Dashboard – and explains...

World News

What makes a libertarian society libertarian? Certainly, one must begin—as did Murray Rothbard—not only with the nonaggression principle, but also with the unequivocal protection...

Editor's Pick

Scott Lincicome Today we’ve published two new essays for Cato’s Defending Globalization project: The More Resources We Consume, the More We Have by Marian...