“Think Twice Code Once”:

Why should we be careful with ChatGPT?

ChatGPT has been launched recently and has attracted a huge attention. There is a great excitement in the IT world, especially among various professional groups such as developers, data analysts and data scientists. Emphasizing the features of ChatGPT such as writing code or converting codes into different programming languages, many journalists or professionals claim that it will threaten the jobs of many people who deal with coding and data, such as developers and data analysts.

There is a swarm to ChatGPT, so to say. It’s very common to come across numerous examples on social media or all kind of messaging groups about the issue and how ChatGPT can easily do a lot of coding tasks that we struggle to do.

However, we should keep in mind that, although the ChatGPT offers many opportunities to make our work easier in the IT world, it is also a human-made product. Therefore, like any product fed with data, it is not free from error. It can make mistakes even in the simplest issues related to coding. Let’s continue with an example from Python.

As you know, one of the most important parts of data analysis is cleaning missing values. At this point, there are various methods to fill the missing values. If we cannot fill a missing value in any way, we resort to dropping it. For this purpose, we generally use the dropna function. One of the parameters inside the dropna function is thresh. Let’s check the dropna function and thresh parameter in Python official documents:

Dropna Function in Python official documents

Source: pandas.DataFrame.dropna

As it can be seen above, the dropna function works on axis=0 by default, that is, on rows, unless stated otherwise. Here is an example of using dropna function and its thresh parameter from the same official source:

An example about dropna function and thresh parameter inside it

Source: I got the df and the code from pandas.DataFrame.dropna

So if we pass 2 into the thresh parameter, this code keeps only the rows with at least 2 non-NA values and drops the others. That’s why it has dropped the first row which had just one non-NA value (“Alfred”).

Now let’s assume that we can’t understand thresh parameter and ask ChatGPT to explain it with a concrete example:

Response of Chat-GPT to the proper use of thresh parameter in Python

As you can see, it handled and dropped columns, although in this example it should have handled and dropped rows. Because the axis is 0 by default and this code can’t drop any columns, unless the axis parameter is changed to 1.

Now let’s enter the message that it should have worked on rows instead of columns and it made a mistake.

My response to Chat-GPT about its mistake and its correction

As you can see, it accepted that it had a mistake and changed the axis to 1. However the answer is still wrong. Because, the columns 2 and 3 have 7 and 8 non-null values, respectively. Let’s write the same dataframe and code in a Jupyter notebook to see the correct answer: When thresh parameter is set to 5, none of the columns will be dropped.

Output of the same dataframe from Jupyter notebook
The correct answer from Jupyter notebook

But let’s say I am a newbie or an expert who has forgotten how to use thresh parameter properly and asked ChatGPT for help. If I didn’t know that the dropna function worked on rows by default and thus believed in what ChatGPT gave me first, then that would cause a crucial problem in my total code and work. This could cause me to manipulate my data, to have wrong inferences and make wrong decisions in the end.

For this reason, it is useful to remember the famous saying in the world of coding: “think twice code once”. It is clear that resorting to artificial intelligence such as ChatGPT is particularly attractive in the world of coding. However, never perceive the response unconditionally true and don’t forget to check it twice!

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Kadir Yildirim

Dr., Ruhr-University Bochum, Data Scientist, interested in digital history!