Published at
Updated at
Reading time
1min

How do you evaluate your software's doing what it's supposed to do?

Do you test all your app's possible cases, branches and states? I don't, at least not manually. Nobody aint time to manually click through all the edge cases. QA'ing a simple login form takes time, let alone testing complex applications.

Having robots do that helps a ton, and I recommend writing automated tests to help you sleep well at night (and release fewer bugs)!

Ignoring the burden of writing and maintaining tests, testing a "normal" web application is straightforward because it's predictable. Throw something at your app and expect a result. It should always do the same. Most apps are CRUD apps anyway — easy peasy.

But what if there are unpredictable parts in your app's core?

If you're riding the AI buzzword wave, you probably implemented an "I know everything" smart-ass right in your app's core that's known for lying and spreading fake news. (Yes, I mean some sort of LLM.)

How would you test your app's quality if you're building software on top of software you probably don't understand?

Here's Hamel Husain's recommendation:

There are three levels of evaluation to consider:

  • Level 1: Unit Tests
  • Level 2: Model & Human Eval (this includes debugging)
  • Level 3: A/B testing

I'm not planning to get into serious AI work or LLM programming anytime soon, but unit testing software sitting on top of LLMs is fascinating and worth more than a bookmark!

If you enjoyed this article...

Join 5.5k readers and learn something new every week with Web Weekly.

Web Weekly — Your friendly Web Dev newsletter
Reply to this post and share your thoughts via good old email.
Stefan standing in the park in front of a green background

About Stefan Judis

Frontend nerd with over ten years of experience, freelance dev, "Today I Learned" blogger, conference speaker, and Open Source maintainer.

Related Topics

Related Articles