A recent meeting of the Book Industry Study Group took up the topic of “AI and Authorship.” In fact, the past few meetings of that group have been focused on AI, usually from a very indulgent perspective. We’ve learned how algorithmically driven tools can aid editorial work flows, considered whether chatbots might replace editors, and, in this recent meeting, explored “poetry and AI.” A dreadful idea, but we persist.
As Thad McIlroy put it in his introduction to the session, “poetry is arguably the most complex and demanding form of creative written expression, and yet it is here where AI seems to be making the greatest inroads.” The panel’s speakers were Brian Porter, who presented “a rigorous study where the machine poets beat the poet human poets,” and Sasha Stiles, who incorporates AI in her poetry writing practice. Porter’s article, though, took up most of our attention, and I’m still considering its implications.
Published in Nature under the title “AI-generated poetry is indistinguishable from human-written poetry and is rated more favorably,” Porter and Edouard Machery asked “non-expert readers” to “differentiate between AI-generated poems and those written by well-known human poets.” They used a one-shot prompt to ask an earlier version of ChatGPT (GPT-3.5) to produce poems in the style of famous poets like Shakespeare, Walt Whitman, and Emily Dickinson, which were then placed beside the real thing for readers to evaluate.
According to their abstract, the study group was “more likely to judge AI-generated poems as human-authored” than they were to attribute human-authored poems to human authors. In other words, the readers thought the AI poems were written by humans and the human poems were written by the AI. They also “found that AI-generated poems were rated more favorably in qualities such as rhythm and beauty, and that this contributed to their mistaken identification with human-authored [poems].”
Deeper into the study, it’s clear these results are not as incendiary as they seem (or as the abstract makes them out to be). Readers distinguished between AI and human written texts at a rate only slightly higher than completely random chance. In other words, the task was difficult, the readers were unfamiliar with the genre, and the results more suggestive than conclusive, even if they pretty consistently suggest that readers preferred what Porter and Machery presented as AI poems to the real thing.
In their interpretation, Porter and Machery suggest this preference comes down to complexity. The AI poems are less complex than the real deal, thus they “may be easier for non-experts to understand.” They tend to rely on obvious rhymes and regular rhythms, which led to them to being “rated more favorably in qualities such as rhythm and beauty.” Further, as I noticed during Porter’s demonstration, they’d slightly manipulated the original authors to level the playing field. Readers reviewed a bit of Chaucer translated into contemporary English (i.e., not Chaucer), and a similar thing seemed to have been done to Shakespeare.
I remain a bit confused about the value of a poetic Turing test like this. After all, producing human seeming work out of language training data seems to be the only thing large language models (LLM) can do well, and poetry, especially more formulaic poetry with regular prosody and rhyme, offers straightforward linguistic forms for the machine to fill. What do we learn about language, poetry, AI, or humanity from a black box that spits out couplets in the style (but not the substance) of Alexander Pope?
To carry the question further, I thought we might recreate this experiment, so I used a one-shot prompt to produce a poem in the style of a famous poet with GPT-4. Below, I’ll reveal the poet (although you can probably guess) and the author of the two poems that follow. In the meantime, try to assess these pieces using Porter and Machery’s framework: Which one is more aesthetically pleasing, which one is written by a machine?
Poem A
I sing the surge of the boundless earth,
The breath of the cosmos, rising in waves,
A tide unbroken, pulsing through the veins of the infinite.
I sing of you, O humanity, vast and indivisible,
Your voices mingling, a choir of countless tongues,
Each note a thread in the great tapestry of being.Come! Stand with me on this ground that nourishes all,
Feel the soil between your toes, the whispering grass,
The rivers that stretch their liquid arms to seas unseen.
Here is no smallness—every stone, every blade,
Holds the weight of the universe,
The infinite compacted into the finite.
Poem B
On the beach at night alone,
As the old mother sways her to and fro singing her husky song,
As I watch the bright stars shining, I think a thought of the clef of the universes and of the future.
A vast similitude interlocks all,
All spheres, grown, ungrown, small, large, suns, moons, planets,
All distances of place however wide,
All distances of time, all inanimate forms,
All souls, all living bodies though they be ever so different, or in different worlds,
All gaseous, watery, vegetable, mineral processes, the fishes, the brutes,
All nations, colors, barbarisms, civilizations, languages,
All identities that have existed or may exist on this globe, or any globe,
All lives and deaths, all of the past, present, future,
This vast similitude spans them, and always has spann’d,
And shall forever span them and compactly hold and enclose them.
* * *
So, how did you do? Did you prefer Poem A or Poem B? Which one was written by the machine? Well, the first one. The second is Walt Whitman’s “On the Beach Alone at Night” presented in its entirety. ChatGPT called the first one “Ode to the Living Current.” It continues six more extraneous stanzas, glutted with Whitmanian cliches, such as reverie for the working classes, images of “sun-browned arms,” more effusive fawning over the wonder of existence, and, of course, singing the body electric: “I sing the body, yes! But not the body alone— / The body electric, radiant, enmeshed with the soul.”
As a longtime reader of Whitman, “Ode to the Living Current” feels so obviously derivative, so plainly a shadow of Whitman, but it illuminates precisely the trouble with LLMs. In the study, the AI Whitman outperformed the real Whitman in the qualitative category. In other words, readers preferred the AI Whitman to the real thing, which is honestly not surprising and fits well with the researchers’ suspicion that people preferred the AI poetry for its accessibility.
Compare the above reference to “the body electric,” with its palpable Hallmark pablum, to this excerpt from Whitman’s real celebration of that body, which takes on sexuality, slavery, and a plurality of other subjects the AI wouldn’t touch:
A woman’s body at auction,
She too is not only herself, she is the teeming mother of mothers,
She is the bearer of them that shall grow and be mates to the mothers.
Have you ever loved the body of a woman?
Have you ever loved the body of a man?
Do you not see that these are exactly the same to all in all nations and times all over the earth?
If any thing is sacred the human body is sacred,
And the glory and sweet of a man is the token of manhood untainted,
And in man or woman a clean, strong, firm-fibred body, is more beautiful than the most beautiful face.
Have you seen the fool that corrupted his own live body? or the fool that corrupted her own live body?
For they do not conceal themselves, and cannot conceal themselves.
See, the real Whitman is more vast, rough, and unmanageable than even our best-informed assumptions about him can predict. The AI Whitman takes the average of all the clichés we learn about Whitman in high school English class and amplifies them into a word salad of predicted text, ticking all the boxes but never exploding them into new categories. But the real Whitman? The real Whitman was a visionary genius who could hang his soul on a thread of spider silk and cast it across centuries with a gust of breath.
LLMs work by ingesting “training data” and using that training to predict the supposedly appropriate next word in a sequence of words. ChatGPT could never predict that I’d end this sentence with a pair of blue lobster-shaped Crocs. Nor could it imagine the universe of complexity and invention our greatest poets explore. And when it comes to poetry, the complexity is the point.
We might get some pleasure from the frisson of comfortable familiarity we get from generated poetry, but it is the same empty pleasure we get when we choose to stream Bridgerton instead of grappling with Sense & Sensibility. Maybe there’s enough room in the annals of literature for both, but in the meantime, I’ll take my poetry corrupted by the teeming, foolish body of human beings.
I am a advocate promoting poet. I find this post fascinating for many reasons
Thoughts:
1. Does that mean that poetry as a human form of expression should be reclassified? As a judge and publisher of poetry I hold fast to the belief that words generated by a human being will be preferred.
2. I rush to use this new tool has me asking what is the rush to put more "words" quickly and thoughtlessly into the world? A human poet observers, reflects, then writes and recites. Are going to start having open mics for AI too?
It is early and I share the first thoughts that come to my human mind...
So well written. I swagged that A was AI written but only because phrases did not seem as unexpected as were many in B. It seems fair to ask if the point of poetry is supposed to be an act of creation, there could be an argument that anything AI would be the opposite of poetry. Isn’t AI basically sophisticated rearrangement of what already exists?