Why did humans develop the eyes we have today?
While scientists cannot go back in time to study the environmental pressures that shaped the evolution of the diverse vision systems present in nature, a new computational framework developed by MIT researchers allows them to trace this evolution in artificial intelligence agents.
The framework they developed, in which embodied AI agents develop eyes and learn to see over several generations, is like a “scientific sandbox” that allows researchers to recreate different evolutionary trees. The user does this by changing the structure of the world and the tasks that AI agents complete, such as finding food or telling objects apart.
This allows them to study why one animal might have evolved simple, light-sensitive patches known as eyes, while another has complex, camera-type eyes.
The researchers’ experiments with this framework demonstrate how actions drove the evolution of eyes in agents. For example, they found that navigation functions often led to the evolution of compound eyes with multiple individual units, such as the eyes of insects and crustaceans.
On the other hand, if agents focus on object discrimination, they are more likely to have evolved camera-type eyes with an iris and retina.
This framework may enable scientists to investigate “what-if” questions about vision systems that are difficult to study experimentally. It could also guide the design of new sensors and cameras for robots, drones and wearable devices that balance performance with real-world constraints like energy efficiency and manufacturability.
“Although we can never go back and trace every detail of how evolution happened, in this work we have created an environment where we can kind of recreate evolution and examine the environment in all these different ways. This method of doing science opens up a lot of possibilities,” says Kushagra Tiwari, a graduate student in the MIT Media Lab and co-lead author of a paper on this research.
He is joined on the paper by co-lead author and fellow graduate student Aaron Young; graduate student Tzofi Klinghoffer; former postdoc Akshat Dave, now an assistant professor at Stony Brook University; Tommaso Poggio, Eugene McDermott Professor in the Department of Brain and Cognitive Sciences, an investigator at the McGovern Institute, and co-director of the Center for Brains, Minds, and Machines; Co-senior authors Brian Cheung, postdoc at the Center for Brains, Minds, and Machines and visiting assistant professor at the University of California, San Francisco; and Ramesh Raskar, associate professor of media arts and sciences and leader of the Camera Culture Group at MIT; as well as others at Rice University and Lund University. Research has come out today science advancement,
Building a Scientific Sandbox
The paper began as a conversation between researchers about exploring new vision systems that could be useful in various fields such as robotics. To test their “what if” questions, the researchers decided to use AI to explore several evolutionary possibilities.
“What if questions inspired me when I was growing up studying science,” says Tiwari. “With AI, we have a unique opportunity to create these embodied agents that allow us to ask the types of questions that would normally be impossible to answer.”
To create this evolutionary sandbox, the researchers took all the elements of a camera, such as the sensor, lens, aperture, and processor, and converted them into parameters that an embodied AI agent could learn.
They used those building blocks as a starting point for an algorithmic learning mechanism that an agent would use as its eyes evolved over time.
“We couldn’t simulate the entire universe atom-by-atom. It was challenging to determine which materials we needed, which materials we didn’t need, and how to allocate resources across those different elements,” says Cheung.
In its framework, this evolutionary algorithm can choose which elements to evolve based on the constraints of the environment and the agent’s task.
Each environment has a single function, such as navigation, food identification, or prey tracking, designed to mimic real visual tasks that animals must overcome to survive. Agents start with a single photoreceptor that sees the world and an associated neural network model that processes the visual information.
Then, over the lifetime of each agent, it is trained using reinforcement learning, a trial-and-error technique where the agent is rewarded for completing its task goal. The environment also includes constraints, such as a certain number of pixels for an agent’s visual sensor.
“These constraints drive the design process, in the same way that there are physical constraints in our world, such as the physics of light, that have driven the design of our own eyes,” says Tiwari.
Over many generations, agents develop different elements of vision systems that maximize rewards.
Their framework uses a genetic encoding mechanism to computationally mimic evolution, where individual genes are mutated to control the evolution of an agent.
For example, morphological genes capture how the agent perceives the environment and control the location of the eyes; Optical genes determine how the eye interacts with light and determine the number of photoreceptors; And neural genes control the agents’ ability to learn.
testing hypotheses
When researchers set up experiments in this framework, they found that tasks had a large impact on the vision systems developed by the agents.
For example, agents that were focused on navigation tasks evolved eyes designed to maximize spatial awareness through low-resolution sensing, while agents that detected objects evolved eyes that focused more on frontal acuity rather than peripheral vision.
Another experiment showed that a bigger brain is not always better when it comes to processing visual information. Depending on physical constraints such as the number of photoreceptors in the eye, only so much visual information can go into the system at a time.
“At some point a bigger brain doesn’t help the agents at all, and in nature it would be a waste of resources,” says Cheung.
In the future, researchers want to use this simulator to figure out the best vision systems for specific applications, which could help scientists develop task-specific sensors and cameras. They want to integrate LLM into their framework to make it easier for users to ask “what-if” questions and study additional possibilities.
Cheung says, “There is a real benefit to asking questions in a more imaginative way. I hope it will inspire others to create larger frameworks, where instead of focusing on narrow questions covering a specific area, they seek to answer questions with a much broader scope.”
This work was supported, in part, by the Center for Brains, Minds, and Machines and the Defense Advanced Research Projects Agency (DARPA) Mathematics for the Discovery of Algorithms and Architectures (DIAL) program.