For individuals who will not be conversant in Geoguessr, it’s a easy and enjoyable recreation through which you might be positioned on a random world location on Google Maps and need to guess the placement throughout a time countdown, the objective is to get your guess as shut as attainable to the actual location. After enjoying this recreation with some buddies I began to consider how I may use the sport idea to construct one thing that enables me to observe with generative AI, that was how this challenge “GenAI GeoGuesser” was born. On my model of the sport you’ll have to guess the nation identify primarily based on hints generated by AI fashions, to assist with the understanding listed here are a number of screenshots showcasing the sport’s workflow.
First, the consumer selects the specified trace modalities, you possibly can select any variety of choices between “Audio”, “Textual content” and ”Picture”, you additionally should choose the variety of hints that might be generated for every modality. For the instance above you’d get 1 trace for every one of many 3 sorts.
The textual content trace may have a textual description of the nation.
The picture trace might be pictures that resemble the nation.
Lastly, the audio trace needs to be an audio/sound associated to the nation (In my expertise the audio hints don’t work in addition to the opposite two).
All of the fashions used to generate the hints above have parameters to fine-tune the technology course of, you might generate longer textual content or audio hints, and even change the fashions. The repository has intuitive parameters to play with.
When you end evaluating all of the hints and are able to guess, sort the guess within the “Nation guess” discipline.
If the guess is incorrect you’re going to get the right nation identify and the space between your guess and the right place.
If the guess is appropriate you’ll obtain a congratulations message.
Now that you’re conversant in the sport’s workflow let’s perceive what is going on below the hood at every step.
The sport begins with the nation choice, right here I wished to imitate the unique Geoguessr habits the place probabilistically you’d be dropped into bigger international locations (extra probability of being positioned there), because of this, simply randomly deciding on a county wouldn’t be sufficient, small international locations would have the identical probability of enormous ones, fortunately I discovered the countryinfo lib which offered an inventory of nations and a few metadata like nation space, beneath you possibly can see how the code appears to be like like.
Choosing the nation
from countryinfo import CountryInfocountry_list = listing(CountryInfo().all().keys())
# construct a dict with nation:space pairs
country_df = {
nation: CountryInfo(nation).space() for nation in country_list
}
country_df = pd.DataFrame(country_df.objects(), columns=["country", "area"])
# decide a random nation the place the likelihood is the nation's space
nation = country_df.pattern(n=1, weights="space")["country"].iloc[0]
Textual content hints
For the textual content trace technology step, I’ve chosen a Gemma mannequin, the model with 2 billion parameters is ready to generate high-quality textual content whereas nonetheless operating quick sufficient to not disrupt the consumer expertise, the Gemma fashions are a household of light-weight, state-of-the-art open fashions constructed from the identical analysis and know-how used to create the Gemini fashions.
from transformers import AutoModelForCausalLM, AutoTokenizer, GenerationConfigtokenizer = AutoTokenizer.from_pretrained("google/gemma-1.1-2b-it")
mannequin = AutoModelForCausalLM.from_pretrained("google/gemma-1.1-2b-it")
immediate = f"Describe the nation {nation} with out mentioning its identify"
input_ids = tokenizer(immediate, return_tensors="pt")
text_hint = mannequin.generate(**input_ids)
# extract the textual content from the output and clear up
text_hint = (
tokenizer.decode(text_hint, skip_special_tokens=True)
.change(immediate, "")
)
You may as well run the textual content trace technology utilizing Gemini fashions through Vertex to get quicker and better-quality outputs (examine the configs file).
from vertexai.generative_models import GenerativeModelmannequin = GenerativeModel("gemini-1.5-pro-preview-0409")
immediate = f"Describe the nation {nation} with out mentioning its identify"
responses = mannequin.generate_content(immediate)
# extract the textual content from the output
text_hint = responses.candidates[0].content material.components[0].textual content
Picture hints
For the picture technology half, I’ve chosen the SDXL-Turbo mannequin, that is the model of the favored Steady Diffusion mannequin that may generate high-quality pictures with as little as a single inference step.
from diffusers import AutoPipelineForText2Imagemannequin = AutoPipelineForText2Image.from_pretrained("stabilityai/sdxl-turbo")
immediate = f"A picture associated to the nation {nation}"
img_hints = mannequin(immediate=immediate).pictures
Audio hints
To generate the audio hints we might be utilizing the AudioLDM2 mannequin. From my experiments with completely different audio technology fashions, this one had a great trade-off between the velocity and high quality of the outputs for this particular use case.
from diffusers import AudioLDM2Pipelinemannequin = AudioLDM2Pipeline.from_pretrained("cvssp/audioldm2-music")
immediate = f"A sound that resembles the nation of {nation}"
audio_hints = mannequin(immediate).audios
With this, we conclude the hint-generation course of, as you possibly can see the HuggingFace libraries make our work fairly simple right here, the principle complexity of this app was associated to the precise workflow of the Streamlit app, this half is a bit out of context of this text as a result of it’s extra technical and particular to that framework, however in case you are curious to grasp it you possibly can go to the Git repository of this project.
Continue learning
If you wish to look into different enjoyable use circumstances of generative AI utilized to video games you would possibly get pleasure from studying my different challenge Gemini Hangman.
To look into one other challenge utilizing a number of modalities of generative AI, take a look at my earlier article on producing music clips with AI.