How AI Sees Your City? is a research project using large language models to explore how people perceive cities. By combining street-level imagery with AI-based evaluative reasoning, the project generates high-resolution maps of urban perception across seven dimensions: beauty, walkability, safety, liveliness, comfort, historic character, and wealth.
The goal is to create scalable, transparent measures of how cities are experienced by their inhabitants, and to make these measures available for research, planning, and public debate. The project is conducted at the Center for Collective Learning at the IAST, Toulouse School of Economics, and is supported by the DUT-COLLINE grant, a European research initiative focused on innovative methods for understanding urban environments.
For centuries, urban planners, architects, and social scientists have sought to decode the silent language of cities. Why do some plazas invite us to linger while others compel us to rush through? What makes a streetscape feel safe, walkable, or beautiful? And how do these visual cues affect the future trajectory of neighborhoods.
Quantitative urban perception research started with small-scale surveys in the 1960s. But during the last fifteen years, this field was transformed by the emergence of crowdsourcing and machine learning methods. This project is a continuation of this tradition, meaning both, the long tradition in architecture and urban planning, and the shorter tradition at our research group (the Center for Collective Learning).
Cities are not just a collection of roads and buildings. In the words of Kevin Lynch, they are a form of "temporal art." In his seminal 1960 work, The Image of the City Lynch interviewed dozens of pedestrians to introduce concepts like "imageability" and "mental maps," arguing that navigation is driven by how easily we can recognize landmarks, paths, and edges.
A few decades later, scholars like Jack Nasar expanded this inquiry into the emotional and evaluative realm. In The Evaluative Image of the City (1998), Nasar reviews dozens of studies formalizing the idea that humans are not passive observers but are constantly evaluating cities.
In 2011, the launch of Place Pulse turned urban perception into a "game" based on paired comparisons. Instead of subjective 1-10 scales, users were asked: Which place looks safer? Which place looks wealthier? collecting hundreds of thousands of judgments across thousands of images.
By 2014, "Streetscore" used this data to train machine learning models, creating high-resolution perception maps for 21 American cities.
Convolutional Neural Networks (CNNs) replaced handcrafted features. Our team capitalized on this with "Deep Learning the City", analyzing millions of images globally. Research also connected perception data to mobile phone traces, showing that people avoid streetscapes perceived as unsafe.
Starting in the 2020s, research shifted to prompting "foundation models" like CLIP and multimodal large language models (MLLMs). With GPT-4V, the human comparison step can be automated at low cost, though audits remain necessary to address risks like "geo-hallucination".