Lemurian Labs is building a new compute paradigm to reduce cost of running AI models

Lemurian Labs is building a new compute paradigm to reduce cost of running AI models

It’s fair to say that Nvidia has found itself in the right place at the right time with demand for its GPU chips at an all-time high, thanks to the resource demands of generative AI models — but what if there were a chip that provided similar power at a lower cost? That’s what Lemurian Labs, an early-stage startup from Google, Intel and Nvidia alumni, is trying to build.

To be sure, it’s a kind of moonshot idea, and it takes a lot of time and money to get a chip to market, but it’s the kind of notion when it comes from founders with a certain pedigree that investors are willing to take a chance on. Today, the startup announced a $9 million seed investment.

“Fundamentally, at Lemurian, our goal is to reimagine accelerated computing. And the reason we want to do that is because the existing way we have done computing is starting to come to an end. And it’s not so much that it’s not a great architecture or paradigm, it is that the physics of semiconductors is pushing back against that paradigm,” Jay Dawani, co-founder and CEO at Lemurian, told TechCrunch.

The company aims to build a new chip along with software to make processing AI workloads more accessible, efficient, cheaper and ultimately more environmentally friendly.

As though holding a master class in computer architecture, Lemurian explains that computing comes down to three things: “There’s math, there’s memory and then there’s movement. The goal is interconnects. So data gets stored in memories that gets moved through an interconnect into a math unit where it gets manipulated, then it gets written back in memory. So that is the traditional point in architecture: data has to travel,” Dawani explained.

Lemurian wants to flip that approach. Instead of making the data travel to the compute resources, it wants the compute to move to the data. “What we’re saying is we need to essentially minimize that distance, so that we aren’t really moving data, we’re moving around compute,” he said.

He says that GPUs were essentially created for graphics-related tasks, but over time have taken on a variety of other roles because of their pure processing capabilities. “Because you’re designing for something, but also trying to do something else, and when you’re trying to do everything, you’re not really that great at doing everything. And that’s really the Achilles’ heel of a GPU. And that’s what we’re trying to fix,” Dawani said.

The way Lemurian wants to answer this is to change the math on the chip, a huge undertaking, no doubt. As Dawani tells it, in the early days of chip development, engineers made a decision to go with a floating point approach because nobody could get a logarithmic approach working. He claims that his company has solved that problem.

“And the beauty of a log number system is that it turns all those expensive multiplies and divides into adds and subtractions, which are very free operations in hardware. So you save on area and energy and you gain speed. And you also gain a bit on exactness or precision,” all of which are quite attractive when trying to bring down the cost of processing on large language models.

How did they do this? “We actually stumbled across the realization that by constructing in a certain way, and extending the definition of a large number system, you can actually create an exact solution, which ends up being smaller and more accurate than floating point for the same number of bits,” he said.

“And as you increase the number of bits, it grows better and better in dynamic range for the same number of bits, which is really, really fascinating. Now, that is a big part of what allows us to explore the architecture we did because without the number system you succumb to the same limitations.”

They are taking a go-slow approach, releasing the software part of the stack first, which they hope to have generally available in Q3 next year. The hardware is much more challenging and will take time and money to develop, manufacture and test in production, but the goal is for that to follow in the coming years.

The company currently has 24 employees, mostly highly skilled technical engineers with a background in this kind of project. That’s a limited pool of people, but his goal is to hire six more people over the next several months, and if all goes well, and they get a Series A, another 35 in the next year.

The $9 million investment was led by Oval Park Capital with participation from Good Growth Capital, Raptor Group and Alumni Ventures, among others.

Building a company like this and getting the chip to market represents a huge and expensive challenge, but if they can pull off what they describe, it could make building generative AI models (and whatever comes next) much cheaper and more efficient.

Source @TechCrunch

Leave a Reply