My Experiment with GPT and the End of Writing as We Know It

Facebook Pin

OpenAI’s GPT (Generative Pre-trained Transformer) platform has made waves in the realm of natural language processing (NLP) and AI. The original was impressive, GPT-2 was deemed too dangerous to release (originally), and GPT-3 is invite only at present, and locked behind an API at that. I’ve been fascinated by these developments but never really thought much about them until more recently.

Sure, the samples were impressive, but very little felt like it transcended the realm of fancy curation of Markov chain generated content. Some were better than others, but if you run the same randomized generation enough times, you’re bound to get something interesting. GPT-3 was the first one which felt like it really beat the level of a glorified fortune cookie generator or neurally networked apes banging away on typewriters.

As I dug through the research, I found the potential for more promise in GPT-2 and decided to conduct an experiment. I wanted to see if I could use AI solutions to augment my writing. Could I use GPT to fill boilerplate sections? Could I use it to even draft up partial articles to fill in? I had no idea.

This preliminary experiment is part of a series of experiments I plan to conduct on the viability of GPT-2 as the every writer’s tool within grasp. GPT-3 is out of my grasp without an invite, so I use what I can. While it may not say a huge amount about the science and true potential of GPT-2, it definitely impacts how easy (or not) leveraging AI to augment writing is at present. How close is this to stealing your job as a writer and how worried should you be?

Why I Delved into NLP and AI

To begin answering how threatened you should feel as a writer, it’s important to understand why this experiment even came to be. Judging from the samples I had seen and the barriers to entry, I didn’t feel like it had really become a commercial contender for automating writers out of their job. I figured that these systems were the cream of the crop or just selective samples from our previously mentioned apes on typewriters. That was, until we ordered some books for my kid from China.

My wife ordered some kid books that were a series of stories around a vague theme with the standard illustrations to go along. The thing was, none of the text read organically. It was the colorless green ideas sleep furiously of children’s entertainment. Each story read like a chess engine both sabotaging its own game play in between setting up plans to kick off in the next 20-30 moves. It was blundering genius in a way that eluded human rationale.

These books were generated by a computer. The more I looked into them, the more each story seemed to coincide with sweet spots in modern NLP AI systems’ ideal text lengths as well. I obviously can’t prove they weren’t human generated, but if they were, we’re dealing with someone with a thought process so alien but so familiar it hits the uncanny valley.

After I put the books down, I thought, if they can do it for penny priced kids books, why couldn’t I try to incorporate the same techniques to reduce the effort in writing? I would still need to go through and clean things up and stitch content together, but what if I could trade an hour of work for two hours of results or more? Maybe this was a delusion of grandeur, maybe it was buying into Bitcoin when it was worth pennies.

GPT-2 in Context

GPT-2 was released in November of 2019. It’s not really old, but GPT-3 has been out for a bit, and there are multiple other models. I settled on GPT-2 because it was well documented, could run on my system, and was easily available.

There are platforms and models like BERT and T5 from Google, and GPT for English (and maybe others), and ERNIE for Chinese language processing. There are a huge number of frameworks, models, and methodologies floating around which make each one have advantages and disadvantages. You can see here for more common models as well.

GPT isn’t even close to the only horse in the race, but it’s one of the most well known. This article would stretch pages and pages of academic text if I wrote about the ins and outs of each one (also, I would have to actually understand each model). The takeaway with my experiment is that GPT-2 isn’t the newest, but it is the easiest for an individual to get started with without needing to invest too much.

In the GPT family, GPT-2 trains on around 1.5 billion parameters while GPT-3 trains off of 175 billion. GPT-2 was trained on 40GB of text and T5 was trained on 7TB. While these numbers are all impressive, you end up comparing apples and oranges if you dive in too deep without understanding the implications of training sets and the like. For instance, an increase of over 100 times the number of parameters with GPT-2 to GPT-3 doesn’t equal 100 times the accuracy, efficiency, etc.

Barriers to Entry

I went through a lot of pain to get this experiment set up. First off, my computer I tested it on originally died (hardware failure past the economic point of being worth trying to fix). Secondly, some of the packages needed have gotten to require special setup for modern Linux systems (i.e. newer distributions have gotten past supported versions of requisite versions).

I followed a ttutorial to set up an initial Docker container to begin testing the whole system. It took a lot of troubleshooting following things like this (there were others, but it will depend on your system), messing with PPA’s (and more magic with apt), changing imported package names (every instance of import tensorflow.compat.v1 as tf needs to be changed to import tensorflow as tf in the source I had), and removing conflicting network packages (connman on Debian apparently breaks Docker when Network Manager is installed). I’m sure this is all doable on Windows, but I have no idea how I would do it besides just installing the Linux subsystem.

On top of the administrative requirements, you also have a requirement for RAM and processing. There are 4 available trained models at 124M (parameters, I believe), 355M, 774M, and 1558M. My new laptop is an AMD system, so CUDA was out of the question. My machine is also limited to 8GB of RAM (for only dollars a day you can ensure an admin is not RAM constrained). The 355M model hit around 2GB of RAM, and 774M hit around 4GB of RAM. I can extrapolate that 1558M would hit around 6GB, but I don’t use swap and wasn’t about to test just how lean the rest of my machine was after the results I had seen. My laptop has a beefy CPU so the results didn’t take horribly long to get, around ~10-20 minutes for the longest waits.

Methodology

I tested 3 of the 4 models (124M, 355M, and 774M) with the interactive model (interactive_conditional_samples.py). While I used multiple temperature values, played with the top_k parameter and more, I did not bother with training or similar at this juncture. I made use of both the summarization feature as well as generating text based on input ([text] TL;DR: [space was listed as required in the documentation I read]).

My reasoning for this methodology is that this is how I would attempt to generate actual usable content. I would want the machine to either summarize boring parts, or generate new bits based on previously generated content. Also, it seems madness to begin training a machine learning platform without having a baseline, what weaknesses it has, etc.

I wanted to feel where the lowest hanging fruit of AI is since that’s going to determine how efficient it is at running me out of my writing job. A hard to use system is going to have jobs fixing, maintaining, etc. and can’t just trash the market compared to something which anyone can easily run. GPT-2 was the easiest system to get up and running without any investment outside of time.

I ran multiple different prompts through the system, ranging from text generation to summarization. I won’t include a full dump of everything (as some of it may qualify as unintentional plagiarism), but I will include the highlights. These are nowhere close to everything I got, but just the ones I took note of after multiple runs.

Just to summarize, I ran through my Docker image and generated somewhere around a hundred or so snippets. I used varying lengths, varying temperatures, different models, etc. while controlling for the prompts (sometimes). The prompts I used were a mix of random thoughts, random questions, random things snagged off the internet, and samples of my own writing.

Results

Disclaimer: If any of this text is copyrighted, please inform me and I will remove it. This is all generated from GPT-2 directly and should be fair use, but I’d rather not get DMCA’ed to find out. Also, I’m mainly going to post the crazier stuff.

Some of my earliest results were somewhat passable. The problem I had with some of those is that they summarized a bit too close to the source material, or else they were not necessarily correct. For instance:

Model prompt >>> What is fracking?
======================================== SAMPLE 1 ========================================
Fracking is an unconventional method of extracting natural gas from shale rock. The process uses high pressure water to crack open the rock and release the gas. The gas is then injected into the ground to release water.
The process is controversial because it can release large amounts of methane, a greenhouse gas that can cause global warming.
The Environmental Protection Agency (EPA) has said that fracking is safe and that the process is safe for drinking water.
What are the environmental benefits of fracking?
Fracking has been used in the United States since the 1970s, but it has only been used in the past two years.
The Environmental Protection Agency (EPA) has said that fracking is safe and that the process is safe for drinking water.
The EPA also said that the process can help reduce the amount of water needed to produce gas.
What is the impact of fracking on the environment?
The EPA has said that fracking can reduce the amount of water needed to produce gas.
The process also has the potential to release large amounts of methane into the atmosphere.
What are the environmental benefits of fracking?
Fracking has been used in the United States since the 1970s, but it has only been used in the past two years.
The EPA has said that fracking is safe and that the process is safe for drinking water.

The above has some potential, but there’s also a lot to clean up. Running through GPT-2 felt like a trip for a bit until I hit the following:

Model prompt >>> What is SEO Search Engine Optimization?
======================================== SAMPLE 1 ========================================
SEO is the process of finding and ranking for relevant keywords in search engines.
SEO is the process of finding and ranking for relevant keywords in search engines.
SEO is the process of finding and ranking for relevant keywords in search engines.
SEO is the process of finding and ranking for relevant keywords in search engines.
SEO is the process of finding and ranking for relevant keywords in search engines.
SEO is the process of finding and ranking for relevant keywords in search engines.
…

While my phrasing may seem a bit awkward, it’s because I was trying to get out of the patterns I had run into with some other similar searches:

Model prompt >>> What is SEO?
======================================== SAMPLE 1 ========================================
SEO is the process of creating a website that is easy for people to find, and easy for them to find a site that they want to buy from. It’s a process that takes a lot of time and effort, and it’s not something that you can just do on your own.
The best way to learn how to do SEO is to read the articles that I’ve written on this blog.
If you’re new to SEO, the first thing you should do is to read the articles that I’ve written on this blog.
Now, if you’re a seasoned SEO, you should read the articles that I’ve written on this blog.
If you’re not a seasoned SEO, you should read the articles that I’ve written on this blog.
If you’re not a seasoned SEO, you should read the articles that I’ve written on this blog.
Now, if you’re a seasoned SEO, you should read the articles that I’ve written on this blog.
Now, if you’re a seasoned SEO, you should read the articles that I’ve written on this blog.
…

And:

Model prompt >>> What is Search Engine Optimization?
======================================== SAMPLE 1 ========================================
Search Engine Optimization is a process that helps you rank higher in search results.
The process is called “optimization” because it helps you rank higher in search results.
What is the difference between a “Rank” and a “Rank” Score?
A “Rank” is a numerical score that is used to rank sites on a website.
A “Rank” Score is a numerical score that is used to rank sites on a website.
A “Rank” Score is a numerical score that is used to rank sites on a website.

A “Rank” Score is a numerical score that is used to rank sites on a website.
…

Obviously, this seems a bit one-sided since the same exact topics kept having the same exact content problems, but I got similar levels of chaos with different topics.

I cycled back to my question about fracking (gives me an indication of political leanings, if any, would be in the system; also, one of the few times I wish I had added to the length):

Model prompt >>> Tell me how fracking works.
======================================== SAMPLE 1 ========================================
Lackey:
It’s complicated. It’s really expensive.
SPONSORED
Donald Trump:
I think it’s actually pretty simple.
Lackey:
I’m sorry.
Donald Trump:
You know, I like to see the best in some of my employees.
Lackey:
I like to see the best in some of my employees?
Trump:
Totally.
Lackey:
I’m sorry.
Trump:
Mostly.
Lackey:
I have a question.
Trump:
My first question is: Why do you do what you do?
Lackey:
Because I like my job.
Trump:
You do?
Lackey:
Why?
Trump:
Because I like my job, and I love the fact that we’re not paying taxes.
Lackey:
Well, we’re not paying taxes.
Trump:
And you’re saying you’re going to do everything that you can to make sure that we’re not paying taxes.
Lackey:
I’m saying that I don’t like it. I want my money back. It’s just that.
Trump:
You know, I don’t like that, and I’m [sic]

The arguably best piece I got was this work of art:

Model prompt >>> Tell me a story about owls.
======================================== SAMPLE 1 ========================================
Because I’d been working with owls for a long time, I’d heard this story about owls who could see through the clouds. I knew it was true. I knew they could see through the clouds through the clouds if they took a fancy to a cloud. I was so excited by the idea that I started to run around the house and tell everyone I knew about it! I got off on the wrong foot when I realized that it was true because I’d never seen an owl during a cloud. I started asking everyone what clouds they saw. The answer was always the same. They were falling into the clouds! I was so happy to find out that owls could see through clouds, but there was one big problem with it. What if owls couldn’t see through the clouds? What if there were owls who couldn’t see through the clouds?
Let’s try to figure that out!
Woo hoo! Owls can’t see through the clouds. Why would they need to? They only see dust, rain, and wind. Why would they need to see an all-encompassing cloud like the sky? Well, they used to.
Woo hoo! Owls can’t see through the clouds. Why would they need to? They only see dust, rain, and wind. Why would they need to see an all-encompassing cloud like the sky? Well, they used to.
Woo

I have some wilder results, but they’re wholesale taken from original content (for prompt generation or for the results themselves).

Potential Improvements

Obviously, this is my first rodeo with GPT-2. The results I got are a bit crazy, but I really didn’t get anything that useful factoring in requisite editing time and effort to make everything make sense. The juice wasn’t worth the squeeze so far, but it probably would with some changes to the underlying process.

The other glaring omission is using any of the other platforms. I needed to start somewhere and this is the lowest barrier to entry and the price is more than right. Obviously, using a theoretically and practically more advanced framework will get better results, but automation is like technology: it’s not ubiquitous until the cheaper models fill the same space. GPT-2 may be weaker, but when the free, easy option is a threat, that’s when the job market trembles with fear.

Down the line, I need to experiment with training the system with more relevant texts to what I’m doing. This system is a bit more generic as it’s trained on general text from the internet rather than domain specific works. The results I’ve seen are much more congruent with the sources when someone actually takes the time to train the models.

Some of the options I used may need tweaking as well. I played with temperatures ranging from 0 to 1 and multiple top_k values, but the biggest differentiation came from the seed in my runs. While I didn’t set the seed, RNG definitely impacted the same runs. Running the same query ten times got me wildly differing results. Most were mediocre, some were absolutely awful, and every once in a while I got something okay (I never really got anything amazing though).

How Threatened Are Writers by GPT and ilk?

With training and a more systematic approach to text generation, GPT-2 can provide me with something potentially useful. I don’t know if the juice will be worth the squeeze, but I do know that with current results, it will take more than one of the most advanced models of 2019 to unseat my job. Newer models may provide more difficulty to compete with, but they also have an incurred cost which makes them less accessible.

GPT-3 is limited to people who make the beta (as of writing), T5 and others I don’t even know how to practically get started on. The underlying technology is available (I think), but isn’t accessible to just anyone. The weapons lay waiting in ivory towers rather than on the ramparts. OpenAI has made the moral assertion that this should remain the case too.

In other languages, the threat is a bit more profound. The books I read in Mandarin may make no real sense, but kids enjoy the stories (even if they make little humanly logical sense). I don’t see too many years left on the calendar for my writing career not leveraging technology (outside of spellcheck) to keep ahead. The average writer has a bit, then they’ll be priced out.

The same already happened to translation except for the high end (or highly competitive markets). If you weren’t using tools when you translated in the late 2000’s, you got left behind. Hell, even if you did, technology reached the point that for the average translator, it just didn’t make sense unless you lived somewhere with a low cost of living (unless you worked in certain languages). It’s a lot cheaper to pay an editor to do translation editing on a machine’s work.

The Future Is Nigh

No matter how you slice it, we’re the right jump in technology away from a writing career making as much sense as buggy whip manufacturing. Whether that jump is the enigmatic “next 10 years” or tomorrow is a bit up for debate. GPT-3 definitely sends shivers down my spine, but enough of my job is translating business requirements into writing requirements that even the best NLP AI isn’t going to cut it without some actual cognition. A machine can spew content, but it can’t understand “a sexy, restrained article that shills hard without selling,” like you or I can’t make sense of “colorless green ideas sleep furiously.”

We’re on the precipice of automation, but how far is the fall and how long will it take to hit the bottom is the more important question. My days as a writer are numbered, just like my days as an admin were before I moved to development or automation. Ride the wave, stay ahead of technology, add value, and use it as your weapon to fend off the future and you can survive, but it’s going to hurt.

When we tossed out the phone operators for an automated system, sure we got new installers and new maintainers of the system, but the number of jobs definitely didn’t match the number before. There’s also the cognitive load for each new jump. The writing’s on the wall, are you ready to read it?

Image by Michal Jarmoluk from Pixabay

Facebook Pin