I put OpenAI’s o1-preview through my 4 AI coding tests. It surprised me (in a good way)

sankai/Getty Pictures

Normally, when a software program firm pushes out a significant new launch in Might, they do not attempt to high it with one other main new launch 4 months later. However there’s nothing regular concerning the tempo of innovation within the AI business.

Additionally: 6 ways to write better ChatGPT prompts – and get the results you want faster

Though OpenAI dropped its new omni-powerful GPT-4o model in mid-Might, the corporate has been busy. Way back to final November, Reuters published a rumor that OpenAI was engaged on a next-generation language mannequin, then often called Q*. They doubled down on that report in May, stating that Q* was being labored on below the code identify of Strawberry.

Strawberry, because it seems, is definitely a mannequin referred to as o1-preview, which is offered now as an choice to ChatGPT Plus subscribers. You possibly can select the mannequin from the choice dropdown:

menu — Screenshot by David Gewirtz/ZDNET

As you may think, if there is a new ChatGPT mannequin obtainable, I’ll put it by way of its paces. And that is what I am doing right here.

Additionally: How ChatGPT scanned 170k lines of code in seconds and saved me hours of work

The brand new Strawberry mannequin focuses on reasoning, breaking down prompts and issues into steps. OpenAI showcases this method by way of a reasoning abstract that may be displayed earlier than every reply.

When o1-preview is requested a query, it does some pondering after which shows how lengthy it took to try this pondering. Should you toggle the dropdown, you may see some reasoning. Here is an instance from certainly one of my coding exams:

It is good that the AI knew sufficient so as to add error dealing with, however I discover it attention-grabbing that o1-preview categorizes that step below “Regulatory compliance”.

I additionally found the o1-preview mannequin gives extra exposition after the code. In my first check, which created a WordPress plugin, the mannequin offered explanations of the header, class construction, admin menu, admin web page, logic, safety measures, compatibility, set up directions, working directions, and even check knowledge. That is much more data than was offered by earlier fashions.

Additionally: The best AI for coding in 2024 (and what not to use)

However actually, the proof is within the pudding. Let’s put this new mannequin through our standard tests and see how nicely it really works.

1. Writing a WordPress plugin

This simple coding check requires data of the PHP programming language and the WordPress framework. The problem asks the AI to put in writing each interface code and practical logic, with the twist being that as a substitute of eradicating duplicate entries, it has to separate the duplicate entries, so they are not subsequent to one another.

The o1-preview mannequin excelled. It introduced the UI first as simply the entry discipline:

entry-field — Screenshot by David Gewirtz/ZDNET

As soon as the information was entered, and Randomize Traces was clicked, the AI generated an output discipline with correctly randomized output knowledge. You possibly can see how Abigail Williams is duplicated, and in compliance with the check directions, each entries are usually not listed side-by-side:

output-data — Screenshot by David Gewirtz/ZDNET

In my tests of other LLMs, solely 4 of the ten fashions handed this check. The o1-preview mannequin accomplished this check completely.

2. Rewriting a string operate

Our second check fixes a string common expression that was a bug reported by a consumer. The unique code was designed to check if an entered quantity was legitimate for {dollars} and cents. Sadly, the code solely allowed integers (so 5 was allowed, however not 5.25).

Additionally: The most popular programming languages in 2024

The o1-preview LLM rewrote the code efficiently. The mannequin joined four of my previous LLM tests within the winners’ circle.

3. Discovering an annoying bug

This check was created from a real-world bug I had issue resolving. Figuring out the foundation trigger requires data of the programming language (on this case PHP) and the nuances of the WordPress API.

The error messages offered weren’t technically correct. The error messages referenced the start and the top of the calling sequence I used to be operating, however the bug was associated to the center a part of the code.

Additionally: 10 features Apple Intelligence needs to actually compete with OpenAI and Google

I wasn’t alone in struggling to unravel the issue. Three of the other LLMs I tested could not establish the foundation explanation for the issue and advisable the extra apparent (however unsuitable) resolution of adjusting the start and ending of the calling sequence.

The o1-preview mannequin offered the right resolution. In its rationalization, the mannequin additionally pointed to the WordPress API documentation for the capabilities I used incorrectly, offering an added useful resource to study why it had made its suggestion. Very useful.

4. Writing a script

This problem requires the AI to combine data of three separate coding spheres, the AppleScript language, the Chrome DOM (how an internet web page is structured internally), and Keyboard Maestro (a specialty programming device from a single programmer).

Answering this query requires an understanding of all three applied sciences, in addition to how they need to work collectively.

As soon as once more, o1-preview succeeded, becoming a member of solely three of the other 10 LLMs which have solved this drawback.

A really chatty chatbot

The brand new reasoning method for o1-preview definitely does not diminish ChatGPT’s means to ace our programming exams. The output from my preliminary WordPress plugin check, specifically, appeared to operate as a extra refined piece of software program than earlier variations.

Additionally: I’ve tested dozens of AI chatbots since ChatGPT’s debut. Here’s my new top pick

It is nice that ChatGPT gives reasoning steps initially of its work and a few explanatory knowledge on the finish. Nonetheless, the reasons may be chatty. I requested o1-preview to put in writing “Good day world” in C#, the canonical check line in programming. That is how GPT-4o responded:

csharp-gpt4o — Screenshot by David Gewirtz/ZDNET

And that is how o1-preview responded to the identical check:

csharp — Screenshot by David Gewirtz/ZDNET

I imply, wow, proper? That is a number of chat from ChatGPT. You can too flip the reasoning dropdown and get much more data:

csharp-thinking — Screenshot by David Gewirtz/ZDNET

All of this data is nice, but it surely’s a number of textual content to filter by way of. I want a concise rationalization, with further data choices in dropdowns faraway from the principle reply.

But ChatGPT’s o1-preview mannequin carried out excellently. I sit up for how nicely it’ll work when built-in extra totally with the GPT-4o options, equivalent to file evaluation and internet entry.

Have you ever tried coding with o1-preview? What have been your experiences? Tell us within the feedback under.

You possibly can comply with my day-to-day challenge updates on social media. You should definitely subscribe to my weekly update newsletter, and comply with me on Twitter/X at @DavidGewirtz, on Fb at Facebook.com/DavidGewirtz, on Instagram at Instagram.com/DavidGewirtz, and on YouTube at YouTube.com/DavidGewirtzTV.

Source link

I switched to the iPhone 16 from an iPhone 15, and the upgrade was bigger than expected

Google Photos now has a subtle new but much needed feature

The best early October Prime Day 2024 deals to shop now

Leave A Reply Cancel Reply

Mathematics behind Gradient Boosting for Regression | by Abhishek Jain | Sep, 2024

How Supervised Learning Works: A Simple Explanation | by shagunmistry | Sep, 2024

Monument Valley 3 breaks the series’ old boundaries by adding a sailboat

You Don’t need to know everything to call yourself Gen AI Practitioner. | by Naveenkumar Murugan | Sep, 2024

Mastering Linear Algebra: Part 8 — Singular Value Decomposition (SVD) | by Ebrahim Mousavi | Sep, 2024

Most Popular

The Hamas Threat of Hostage Execution Videos Looms Large Over Social Media

Revolutionizing the Way We Find Love

Federal Investigators Widen Tesla Inquiry, Company Says

Our Picks