Artificial Writing: a first evaluation

LIKE every other domain of everyday life, education at all levels has been battered by digital technology even in places where it isn’t called for. Now the alarm has been raised about a new AI program, ChatGPT, which can be used, it is said, to write academic essays. A free trial version of the program was launched at the end of 2022, and gained a million subscribers in the first month. You give it a prompt in natural language and it returns a coherent and apparently cogent text. Before coming to a judgement about it, one should of course try it out, and my first impression is that its essay writing skills are stilted but it looks like it might make a useful research tool. Its great advantage, after trying it out with a few queries (called prompts), would appear to be its speed, which is much faster than using Google, and where Google delivers you a list of results which you then have to trawl through, here you get an immediate answer in formal, polite, and completely impersonal prose.

We have to assume that students are going to start using it, and doubtless discover their own ways of doing so. A quick glance at the website’s Community Forum reveals all sorts of uses, from turning the transcript of an interview into an article to writing bedtime stories. These more complex uses require the developer version, which allows fine tuning, training and longer texts (for which charges are calculated on the number of ‘word-parts’ required, such that longer words with more phonemes cost more but each word is a mere micropayment). My guess is that they’re going to make a great deal of money from commercial users to produce ads, emails, blogs and who knows what. These remarks are based on the free trial version, which is what students will gravitate to, and my tests reveal a number of drawbacks, including certain biases and errors, which will escape the casual or naïve user. Because of this, I’m going to argue that instead of panicking about the disruption this will bring, we need to teach it. This involves two things. On the one hand, not to demonise it, but how to use it prudently and with discrimination. On the other, not to hype it, but to understand that however intelligent it might seem, it is precisely artificial. It is built on what is called a large language model, by being ‘trained’ on millions of pages of text scraped from the web and other sources, using machine-learning algorithms to generate similar text. It does this by predicting statistically likely continuations of word sequences. It doesn’t ‘know’ anything else, and it only means something when read by human eyes. According to experts, large language models have limited reliability, are prone to bias based on the data they were trained with, and a lack of transparency about how the responses are generated.

I tried a variety of prompts, suggested to me by things I’ve been thinking about recently, like a topic raised by a PhD thesis I’ve been reading, so not exactly elementary. First up: ‘What space is the internet a representation of? Is it a representational space, or the dissolution of representational space?’ With almost no delay it came up with a perfectly coherent answer in 91 words.^* Then I prompted ‘Discuss: Economics is not a science, any more than politics.’ This answer, in 129 words, was also coherent, describing economics as a social science, and then considering politics in the same way, as another social science, but not as an activity pursued by politicians.^** An easy elision to make, the word is ambiguous.

The best test strategy is to ask something you already know about, so I turned to music (which I’m always thinking about), and in order to get a longer answer, asked for 500 words on what Lévi-Strauss says about music. I quickly got back 323 words consisting of four short paragraphs and a summary. I responded ‘This is too repetitive and says nothing about the two grids’, a reference to a key passage about the two grids of the natural and the cultural. The response was ‘You are correct, I apologize for the repetition. In addition to the ideas I previously mentioned, Levi-Strauss also proposed the concept of two musical grids…’ which it explained reasonably well in a couple of short paragraphs and a formulaic conclusion. The same thing happened when I prompted the topic of music and language and then raised objections to something said in the response. Several more tests like this revealed an underlying pattern in the structure and language of the generated texts: formulaic, repetitive, polite, and very general, suggesting some kind of template.

ChatGPT speaks in the first person and addresses the user as ‘you’, and you discover you can pin things down a bit by asking more specific questions, in other words, conversing with it. This carries its own dangers, because the illusion of conversation is beguiling, as Joseph Weizenbaum discovered way back in the 1960s when he wrote a program called ELIZA, designed to imitate a Rogerian psychotherapist engaged in an initial interview with a patient. He was startled, he said, to discover how people responded to the program, ‘how quickly and how very deeply people conversing with [it] became emotionally involved with the computer and how unequivocally they anthropomorphized it’. (Weizenbaum 1984.) Unlike the large language model, it was a compact program which fitted onto a floppy disk and circulated widely, and I tried it out when a copy fell into my hands sometime in the 1980s. It provided an hour or two’s entertainment until it couldn’t cope and came up with a grammatically nonsensical response, which made me laugh aloud and gave me the triumphant feeling that I’d outwitted it. As for Weizenbaum, he was even more taken aback by the response of a number of psychotherapists who seemed to seriously believe that such a program could serve to provide an automated system of treatment which would compensate for the scarcity of therapists. Nowadays called a chatbot, this is close to coming about, if not in psychotherapy then in the application of AI to medical triage to replace increasingly scarce family doctors. Their efficacy is in question, but at least, if they don’t know the answer, they refer you to someone who does.

Not so with ChatGPT, as revealed by more experiments and further perusal of the Community Forum. It turns out that ChatGPT doesn’t search the internet for resources but generates dummy sources instead. Pursuing a question I’m writing something about, I asked it if the informal sector in advanced economies has grown since 2008, and it replied correctly that such data was difficult to gather but some studies suggested that this was the case. I asked for details of these studies and it cited a couple of examples from academic journals but without giving titles or references. When I asked for them the response came back, ‘I apologize for the confusion, but the studies you have mentioned does not have a precise reference as it is not a specific study, it could be a synthesis of several studies done on the topic.’ Curiously, this is the only sentence I got from it with a grammatical error. Perhaps by challenging it I’d got its knickers in a twist, like my encounter with Eliza (although it took longer to reach the sweet spot). At any rate, it seems to be programmed not to admit its ignorance, although according to a Forum contributor, it could easily be trained into answering “?” to anything it doesn’t know the answer to. I decided it’s best to treat its responses with a pinch of salt.

The invention of non-existent sources alone might be enough to rule ChatGPT out of order for educational purposes, and indeed has already been banned by a number of journals and universities. But I wonder whether, if a student were to reproduce a false source provided by ChatGPT, it would easily escape attention. I am put in mind of my housemate when I was a student who gaily admitted that he’d invented a source in an exam paper for which he’d been awarded an A. Exams or essays, the problem from the point of view of assessment is that as it is, no-one has time to check every reference (although you can sometimes smell out the bad ones). Students don’t do this either, and sometimes repeat errors, but they’re likely to trust what the program tells them because it does so in an authoritative voice – without realising that this voice is constructed, an artefact designed to be balanced, dispassionate and circumspect. Doubtless the programmers who have designed it would say that it isn’t designed but the algorithmically driven result of machine learning. This does not explain why the program speaks the way it does, with a bias to deliver a bland imitation of the discourse of the reporter in public service broadcasting (unless you ask for something ‘in the style of…’).

I found a simple test which demonstrated this. I asked for a script for a short experimental video film about climate change. This is what it came up with:

Title: “Melting Point”
Fade in: Establishing shot of the earth from space, zoom in to polar ice caps.
Voiceover: “The earth is in a state of change. The polar ice caps, once symbolising the stability and longevity of our planet, are melting at an unprecedented rate.”
Cut to: Timelapse of polar ice caps melting, intercut with shots of polar animals such as polar bears and walruses, struggling to survive.
Voiceover: “The effects of climate change are far-reaching, impacting the lives and habitats of countless species, and ultimately, our own.”
Cut to: Interior shot of a laboratory. A scientist is shown studying ice samples and recording data.
Scientist: “The data speaks for itself. The polar ice caps are melting at a rate faster than we ever imagined.”
Cut to: Exterior shot of a bustling city, with cars and factories producing smoke.
Voiceover: “Human activity is the primary cause of this melting, our rampant use of fossil fuels, deforestation, and other unsustainable practices.”
Cut to: Various shots of people going about their daily lives, using plastic, driving cars, using energy-inefficient appliances.
Voiceover: “We must make changes to our way of life, to preserve the future of our planet, and all the species that call it home.”
Cut to: A person planting a tree.
Voiceover: “Small changes can make a big impact. Start by reducing your carbon footprint, recycle, use energy-efficient appliances, and make conscious choices.”
Cut to: Establishing shot of the earth from space, zoom out to show the entire universe.
Voiceover: “The future is in our hands. It’s time to take action, before it’s too late.”
Fade to black. End.

There are immediately two things to say about this. Notice, first, that in constructing a coherent argument and avoiding any whiff of denialism, it’s as if it assumes what lies behind the request and responds accordingly. I’m not complaining, of course, but I’d like to know how it does this. Secondly, it isn’t experimental. In aesthetic terms, it’s utterly conventional. It doesn’t understand what ‘experimental’ means, at least in this context. When I prompted it with the objection ‘This is not experimental. Try again, without using voiceover’, it came back with almost the same thing, sans the spoken bits, slightly re-ordered. But I have to admit that if this were a treatment submitted by a first year undergraduate, I have little doubt that I would have considered it a good enough starting point, although the question would then be what they made of it.

And where, I wonder, does that leave us? There are already video programs which offer to organise your footage for you. I can imagine a version of ChatGPT trained on YouTube – I expect someone’s already working on it – which would turn its own script into a film. I can’t imagine it would show any originality (except by accident), any spark of the thought and creativity we hope to coax out of the student. But that’s not what will drive the adoption of artificial filmmaking for commercial content providers. Or is that just doom-saying ?

____________

^* ChatGPT: The internet can be considered a representational space in the sense that it is a virtual space where information and data are represented in various forms, such as text, images, videos, and audio. However, the internet also challenges traditional notions of representational space by allowing for the creation of new forms of representation and the dissolution of boundaries between physical and virtual spaces. Additionally, the internet enables the sharing and dissemination of information on a global scale, breaking down traditional barriers and blurring the lines between different cultures, communities, and countries.

^** ChatGPT: Economics is often considered a social science, as it involves the study of how individuals, groups, and societies make decisions about how to allocate scarce resources. While it does not have the same level of empirical precision as natural sciences like physics or chemistry, economics uses a variety of tools, such as mathematical modeling and statistical analysis, to make predictions and understand complex systems. Similarly, politics is also a social science, it studies the allocation and transfer of power in decision making within groups or societies. Both economics and politics are complex fields that involve many different factors and can be difficult to predict or understand, but they are still considered sciences in the sense that they use systematic methods to study and understand human behavior and social phenomena.

REFERENCES

Joseph Weizenbaum, Computer Power and Human Reason, Penguin Books, 1984.

Putney Debater

A personal blog

Artificial Writing: a first evaluation

Recent Comments

Categories

Archives

Recent Comments

Categories

Archives

Tags