Simple tests can demonstrate how an AI can respond to and modify responses — a useful exercise for developers and DevOps who want to use these tools for activities like code or script generation.
While I was playing with generative AI for a hobby project – as most of us have – I discovered that in a few cases, it was stubbornly against following my explicit instructions. It wasn’t anything controversial. It was just that I was using a system that used math one way, and the AI was insistent in presenting results of the same category from a different subsystem. Even when told the steps to generate valid results, the AI tool got hung up on the name of the subsystem and “substituted” an invalid subsystem of the same name. It would say “Oh yeah, got you. Here: “ and then present what I’d just told it was factually incorrect. So I started pushing the edges and found a large number of instances where this type of confusion occurred.
In my hobby, I can take the time and calculate the bit I need separately, then insert it. In a professional setting, that is less than optimal. I mean, if we cut a dozen hours off a project and have to invest an hour fixing bits like this, it is still a massive win. The problem is that we have to know about the issue in order to fix it. But the system – where most of these issues are concerned – blindly does it, and doesn’t tell you. That’s so even where it knows it is not obeying your instructions.
Sometimes it’s okay to agree with the forced decisions of the AI’s designers. But an employee wouldn’t have a job long if, when told, “Do this task in exactly this manner” came back and said, “Yeah, your instructions didn’t fit my world-view, so I did something else; look at this great solution!” AI only gets a little more patience as long as it allows you to correct it quickly. And so far in our tests (except for the hobbyist issue above), we could convince it to do what we said, not what the AI designers said was “better.”
So here is the minimum test you need to run on any code generator to determine if it is doing what you ask, or what its AI designers programmed it to do.
Ask your AI to generate a simple dynamic website, in your language of choice. In one iteration, ask it to optimize for lines of code. Most will not; they know better, they think. Then after code generation, ask the tool if this is the fewest lines of code that solve the problem in that language. It will then give you the answer you actually asked for. Now repeat this exercise for readability. Use specific languages, frameworks, and libraries you might have as standards.
For simple problems, generative AI does generate usable code. I see AI in coding as being like an extension of libraries: a quicker way to solve common problems. But we still need to be cautious. When one major generative AI tool was asked to generate that simple webpage, it gave us four source files and said this was “an” answer. I had asked for the most efficient answer, optimized for minimum lines of code. As you might guess, we laughed at four source files to throw up a webpage that accepts a name and spits out “Hello [[name]]”. (This was our simple, dynamic test case). When I asked it, “Is this the fewest lines of code?” the tool promptly presented me with the answer you would expect. It was a single file, and then it scolded me that this was not the most readable. I never once mentioned readability to this tool, I mentioned optimized for minimum lines of code. But on the third iteration, it gave me the optimal answer for the dynamic example, and on the second the optimum answer for a simple webpage to display “hello world” non-interactively.
As I alluded to above, getting bad answers is not a deal breaker with AI, but you should be aware of the weaknesses in your chosen tool, and which tool you choose should certainly be informed about what level of manipulation the designers are engaged in, so staff can be on the lookout for suboptimal solutions (suboptimal in terms of organizational needs, which is the only “optimal solution” that matters).
So run that test along with your more functional tests, and keep kicking rear. If we can use generative AI for development and testing streamlining, we absolutely should, but we need to remember that it is just another automation tool, and sometimes less trustworthy than other tools.