The Simplest Fix

The jangling tswaerp of an alert notification pulled Dave away from his casual browsing. Irritated, he reached for his phone; it was surely just another false alarm, but making sure it was a false alarm was his job, after all.

Primary service endpoint inaccessible for 15 seconds, read the alert text, and Dave sighed in annoyance. How many times had he argued for raising that threshold? Even a simple network outage, perhaps a farmer digging where they shouldn’t, could cause a blip like that as the networks auto-routed around the damage. But the suits said 15 seconds, so 15 seconds it was.

Well, a simple confirmation could solve this. He opened a fresh client instance and waited for the service’s greeting page to show.

And waited.

And waited, as lines of worry began to crease his brow.

Even before the client timed out and displayed its error message, Dave set the phone down and brought up the system management interface on his work terminal. Then he stared in disbelief at the story it told: a bare ten percent of the allocated server instances running, that number dropping even as he watched. “What the hell . . . ?” he whispered.

Tswaerp, his phone replied helpfully. Primary service endpoint inaccessible for 60 seconds.

Jabbing the alert away, he pulled up the event log. Was there some sort of hardware issue, like a cooling failure causing all the machines to shut down? Was it perhaps just a particularly bad configuration mistake, a bad setting which was only now taking effect? Or was there something more nefarious going on?

But the story the log told did not appear to fit any of those patterns. Server instances going down, being restarted, then immediately failing again, continuing even as he watched. Load balancers reconfiguring themselves over and over. And—build failures?

Tswaerp. Auto-repair actions exceeding threshold.

Dave reached reflexively for the “mute” button, then hesitated at the unfamiliar alert.

Tswaerp. Primary service endpoint inaccessible for 60 seconds.

This time he did mute the alerter, but . . . auto-repair actions? That could only mean the auto-debugging system they’d put in place five years ago now, tying the AI code generator and fault analyzer directly into the service management system. If a server process ever logged an error, the analyzer would kick in, find the cause, and send a request to the code generator—which would, in turn, automatically update the server code to fix the bug. It very neatly solved the lag of a human having to run the analyzer, look at what it reported, retype that into the generator, and then copy the generated code into the right place (which actually turned out to be the greatest source of human-introduced bugs). Dave himself, as chief implementor, and Jess, the project’s chief engineer, had gotten promotions for that work, which ultimately meant they now got a salary for little more than sitting in a mostly-empty office and relaxing all day.

Dave got up from his seat now and walked over to where Jess sat, music faintly leaking from earbuds as she tapped away on her keyboard, seemingly engrossed in some pet project or other. “Jess,” he said, raising a hand to get her attention, “got a minute?”

“I’m sorry, Dave,” she said without looking up. “I’m afraid I can’t do that.”

Dave rolled his eyes. “I will build an airlock in the wall and kick you out. Seriously, we got a problem.”

Now she did look up, taking one earbud out. “Problem?”

“Something’s screwy with the auto-repairer. Server instances down, tons of build failures in the logs—”

“Wait, what?” The other earbud came out as well, and she tossed them onto her desk as she loaded the management console on her own terminal. It showed the same state Dave had seen, except that the active instance ratio was now down to a tiny fraction that might as well be zero. She quickly went to the error list, picking one of the many build failure reports shown there and scanning over the output log.

“What the . . . type mismatch? Seriously?” She stared at the offending line in disbelief for a moment, then opened a source code browser at the relevant version, already outdated as the auto-repair system continued submitting changes.

“Here,” she said a moment later, pointing to a line which had been added in the previous repair attempt. “It’s defining a new type, but it’s only using it here, so of course it doesn’t match—but I know I tested this case before . . .” She began scrolling through the version history now, looking at other changes the auto-repairer had been making. “Yeah,” she continued after a moment, “all of these make sense locally, but it’s like the generator suddenly got stupid or something.” She thought for a moment. “We didn’t have a model update recently, did we?”

Dave shook his head. “Last one was . . . a year ago, I think? And it’s been working fine for my scripts, at least.” All two he’d touched in recent memory, to be fair, but he hadn’t noticed any problems in the AI-produced code. At the least, it was probably better than what he could write himself. Whoever hand-wrote code anymore?

“A year ago . . .” Jess repeated, jumping back to that point in the history, then slowly scrolling forward. And now that he looked at it, Dave could see that there was a slow increase in the rate of changes since that point: nothing obvious until today’s still-unfolding disaster, but more frequent repair events, and sometimes clusters of two, three, or even more changes for a single incident. “Yeah, something’s fishy,” Jess said half to herself, “but . . . unless . . .” A frown crossed her face. “Ugh, I hope not.”

“Hope not . . . what?” Dave wasn’t sure he wanted to hear the answer, but he couldn’t help asking.

“Old theory I read once,” Jess explained. “Claimed AI had a ceiling, would get worse over time as it trained on its own output, like the old telephone game.” She sighed. “Everybody assumed it was bunk, but—”

“Hey, you guys?” came a voice from across the room.

“We’re on it,” Jess shouted back even before Dave glanced up to see their boss Sam looking uncertainly toward them.

“We’re starting to get complaints, so, uh, just so you know . . .”

“We’re on it,” Dave repeated, then refocused his own attention on the display. Explaining the problem could wait until they’d figured out what to do about it.

But . . . what, indeed, to do about it? They could hardly return to the previous model; that would have long since been consigned to the electronic dustbin by the AI megacorp which ran it. The theory would be to roll back their own code to the last working version, get that up so at least the service was running, manually fix whatever bug triggered the current disaster (hoping it was within their manual coding skills to fix), and then start digging into—

“Shit,” said Jess amid a sudden flurry of keyboard tapping.

Startled out of his contemplation, Dave saw that Jess had opened a command terminal and was forcibly shutting down one internal service after another. “Wait, what?”

“Repo’s corrupt.”

Shit, thought Dave.

“The change rate must’ve triggered some storage bug—all the VCS ops are returning errors. I’m stopping everything, gonna see if I can recover it.”

Dave nodded worriedly. Their version control software was widely used throughout the world, and ought to have been well tested against faults like this. But then again, the authors of that software were probably also using AI to write their code now, and if they were using this same, supposedly new-and-improved model . . .

Then a thought occurred to Dave. “Hey,” he said to Jess, “when you were first building the repairer, you did some tests with having it generate code from scratch, right?”

Jess grunted an acknowledgment, still focused on her cleanup attempts.

“You still have that code somewhere?”

“Uh . . . check under ‘experiments’, I think.” Then Jess glanced up in surprise. “Wait, you want to use that to rebuild the code?”

“More I just want to see if the current model can pass those, but if the repo’s busted, that might be our only option.” Dave paused. “That, or rewrite it all by hand.”

Jess stared at him in response, somehow combining “have you completely lost your marbles”, “we’re in way over our heads”, and “this is going to get worse, isn’t it” in a single facial expression. Dave simply nodded agreement to all of those facets as he turned back toward his own seat.

“Hey, uh, you guys?” came Sam’s voice again. He recoiled slightly at the sharp look Dave and Jess both shot in his direction, then continued, “Just, uh, wondering if this is related to the SocialNet thing?”

“The what?” Dave reflexively reached for his phone before remembering it was still sitting on his desk, with probably dozens of alerts stacked up by now.

“SocialNet went down a few minutes ago. I thought it was weird we were getting complaints on the smaller networks but not on SocialNet, but . . .”

Jess had already picked up her phone and was staring at it, color draining from her features. Shortly she looked up, letting her gaze land at a nondescript point on the far wall. “Do I want to know what else is going to go wrong today?”

The overhead lights chose that moment to go dark along with their work terminals, leaving the room lit by two phone screens, an emergency exit sign, and a row of windows onto a cloudy sky.

“So . . .” Dave said into the ensuing silence. “Anyone up for some ramen?”


Andrew Church - achurch@achurch.org