This Robot Can Rap—Really

Deep-learning robot Shimon writes and rhymes in real time

Musical robot Shimon in front of a marimba.

Gil Weinberg

What if your digital assistant could battle rap? That may sound far-fetched, but Gil Weinberg, a music technologist at the Georgia Institute of Technology, has adapted a musical robot called Shimon to compose lyrics and perform in real time. That means it can engage in rap “conversations” with humans, and maybe even help them compose their own lyrics. Shimon, which was intentionally designed to sound machinelike (listen here), is meant to be a one-of-a-kind musical collaborator—or an inhuman rap-battle opponent.

Computer-generated music dates back to the 1950s, when early computers used algorithms to compose melodies. Modern robots can use machine learning to ad-lib on instruments including the flute and drums. One such machine was an earlier version of Shimon, which could play the marimba and sing. The recently updated robot looks the same; it still consists of a ball-shaped “head,” with saucy movable eyebrows above visor-covered eyes, perched at the end of a mechanical arm. But now Weinberg claims Shimon is the first improvising robot to foray into rap, with its distinct stylistic features that pose unique programming challenges.

The crowning glory of rap lies in the lyrics. On top of semantic content, the words need to adhere to an aesthetically pleasing beat and rhythm, all while delivering multiple layers of poetic complexity. In a recent paper, published in the proceedings of the 11th International Conference on Computational Creativity 2020, Weinberg’s research team outlines the technical advances that brought a rapping Shimon to life.

On supporting science journalism

If you're enjoying this article, consider supporting our award-winning journalism by subscribing. By purchasing a subscription you are helping to ensure the future of impactful stories about the discoveries and ideas shaping our world today.

When Shimon battle raps, software converts its human opponent’s spoken lyrics into text. The robot’s system identifies keywords from this, and generates new lyrics based on several custom data sets of words that Shimon has been trained on (using deep-learning models). These data sets can come from any text: the work of Lil Wayne, JAY-Z or other rappers; lyrics from other genres; or even nonmusical literary works. Imagine how Shakespeare or Jane Austen might sound if they rapped; Shimon could simulate that for you.

One novel element of Shimon’s design, its creators say, is the additional use of phoneme data sets to conceive new lyrics. Phonemes are the distinct units of pronunciation that make up the sound of a word. Breaking down keywords into these units is the most effective way to integrate rhyme into the lyrics, says Richard Savery, the first author of the paper on Shimon and a music technologist at Georgia Tech. “The way phonemes relate between words is really important,” Savery explains, sometimes even “more important than the actual meaning of the words.” The training data set of phonemes enables Shimon to churn out keyword-centric phrases in rhyme, and the robot then layers a rhythmic beat onto its speech.

Shimon’s systems must be fast enough to respond in real time without compromising performance quality. To achieve this, the researchers made several tough programming decisions, such as capping Shimon’s response vocabulary at around 3,000 words and truncating the length of time Shimon will “listen” to its opponent. So far Shimon can rap a comeback in less than seven seconds, while improvising gestures such as head bobbing and eyebrow waggling. Hardware upgrades, such as a more powerful graphics processing unit, will eventually make the process speedier.

No individual component of Shimon’s technology is completely new—but this particular assembly of parts is, says Prem Seetharaman, a research scientist at the tech start-up Descript, who was not involved in the project. “Generally, the field is pretty siloed into different things like speech-to-text, text-to-speech, music,” Seetharaman says. “The field is approaching a good [enough] level of complexity so that people are able to take these [components] and connect them together into really interesting interactive systems.”

Beyond Shimon’s novelty value, Weinberg wants his robot to provide opportunities for people to experiment with new kinds of music. “It’s not interesting to me if [Shimon] does its thing without humans … as a completely autonomous musical system,” he says. His goal is to see his robots “communicating and interacting with [humans] and inspiring them in surprising ways.” Weinberg had never written lyrics before, but says Shimon enabled him to produce songs for the first time. He adds that he has even received requests for help from lyricists afflicted by writer’s block.

Seetharaman, a recreational musician himself, also says he is excited by the possibilities Shimon’s technology might offer nonmusicians. “Tools that use AI can reduce the barrier to entry … to making art,” he says. “People do it all the time: you see people make Instagram Stories and TikTok [videos].”

Professionals, however, have some reservations. Rhys Langston, a rapper and multimedia artist who was not involved in the project, says he would be keen on rapping with Shimon, especially since the COVID-19 pandemic has limited the in-person interactions from which Langston derives most of his inspiration. He says it is impressive what artificial intelligence can achieve—but also suggests that robots simply cannot access the inspiration that sometimes serendipitously arises from things like human error. During a recording session, Langston explains, mistakes occasionally end up in a final recording because they sound surprisingly good. Accidents “unlock possibilities because not everything [in a recording] is planned out,” he says. “Can you teach a machine to make mistakes?”