
What is often overlooked about “Large Language Models” (LLM) are that they were primarily trained on English content, and was designed to be used primarily in English. What is more, there is a cultural assumption that writing direct and highly detailed prompts is second nature to most people. Even as an English speaker myself, I have had to think deeply about how to write good prompts to gain the results I want. With the way the prompt engineering cycle works, it is most efficient to be able to generate something useful in as few prompts as possible.
However, in our interactions with Japanese users and their experience with prompt engineering, it can be a frustrating experience. The best strategy for prompt engineering is to write in simple and direct language for exactly what the end goal should be. For example, “make this house prettier” is far less effective as “add white tile on bottom half of the house, a glass car port cover over the parking space, and change the roof to blue tile”. For many people this sort of directness in language needs some practice to get accustomed to. As such what may seem as culturally normative way to ask a person to do something would not work for prompt engineering.

To help remedy this, we have added an option in ArchiX to make this experience easier for our users. In our app, users can select from a drop down of predefined actions like “add”, “replace”, or “remove”, and with that some basic user defined fields to let our AI service know what to act upon and how. Our model is able to better interpret this in multiple languages, and usually results in fewer generations to gain the desired results.

To understand how this helps, it is important to understand how generative AI edits an image with a text prompt. The service does a process called latent mapping, in which the image is broken up into recognizable pieces which are identifiable by the model. This includes things it can identify like cars, windows, doors, and trees; this sort of segmentation is called latent space in which the AI service has organized the image in a way to easily reference. This is then paired to match with the text prompt as a way to target what the user wants to change about the image. Once the target area is understood, there is a cycle of layering to shift the target towards the intended goal of the text prompt.
How our promptless prompt tool helps, is in the background we have predefined how to target the latent space, and the user input is merely the last gap on what to target, and how to change it. This creates on the far smoother and efficient experience for the user, especially one unfamiliar with how prompt engineering works.