Archiv

Scenario: Manipulations

Christina Class & Debora Weber-Wulff

AlgoConsult is a company that develops highly specialized computer processes for a wide range of applications. For marketing purposes, customers call these processes “algorithms,” and increasingly, this is the term used in-house, even though it’s not what they are.

With their „CompanyRate“ project, the firm is generating a rating system for investors working in the German banking sector. The ratings index is intended to facilitate future investment decisions. A pilot version was presented to a few selected beta customers last month, and the initial feedback has been glowing.

Various machine-learning approaches are combined to generate the ratings. The number of influencing factors is enormous: in addition to stock market prices and current market information, data on known advertising budgets, media presence, trade fair participation, market shares, etc., are used. AlgoConsult keeps its algorithms and influencing factors strictly confidential, especially to prevent manipulation of the „CompanyRate“ index: it’s a feature AlgoConsult touts on its project website.

Therefore, all project participants are carefully chosen from employees who’ve been with AlgoConsult for at least a year and who must sign project-specific NDAs. They are prohibited from investing privately in indexed companies or in funds in which these companies have a significant share in their portfolios. In exchange, the software analysts are paid handsomely.

Achim is a proud „CompanyRate“ core team member responsible for the index. One day, as he’s leaving for lunch, he stops by to pick up his colleague Martin. They always have lunch together at the Japanese or Mexican restaurant on the first floor of the high-rise building, but Martin says he’s brought something from home today. At the elevator, which can only be opened with a pass card, Achim realizes that he left his wallet in his raincoat, so he returns to the office.

When he reaches the office door, he can hear Martin—who is usually very quiet—speaking loudly on the phone. Since he’s alone in the hallway, Achim eavesdrops furtively on the conversation. He thinks he can make out that Martin is on the phone to someone at PFC, “People’s Fruit Company.” Despite its American name, PFC is a German company listed in the index. Achim clears his throat and enters the office. Martin quickly drops the call.

“My mother, she has to call me every day about something,” Martin chuckles nervously. Achim grabs his wallet and meets his colleagues from another department for lunch as usual, but he has trouble concentrating on the conversation.

Even though the project is at a lull because the index is being tested with beta customers, Martin is unusually focused and busy all afternoon. He doesn’t even seem to have time to break for coffee. He is still working when Achim drives home.

The following day, Achim sees that Martin entered a bunch of code the night before, and it successfully ran through the test suite overnight.

“Geez, Martin’s been busy,“ thinks Achim while looking at the logs. The boss will be happy about all sorts of documentation in various corners of the system. Achim decides to display a few of the program changes—all of which contain comments, as usual. But he wants to be sure the changes made to the documentation were, in fact, correct.

The third change throws Achim for a loop. The formulas themselves have also been changed. Now, bizarrely, a value is read from a file from the cloud instead of being calculated inside the program as before. On closer inspection, he realizes that the cloud value is only applied in cases that are an almost exact match with PFC.

Now, Achim isn’t sure what he should do. Just last week, Anne was terminated without notice, even though she is an excellent programmer. In the past few weeks, she’d seemed somewhat out of sorts and had scheduled a meeting with her boss. Afterward, security immediately escorted her to her desk to collect her belongings and showed her to the door. That was pretty shocking for the team. But to avoid being dragged into anything, everyone acted as if nothing had happened.

That evening, Achim had tried to contact Anne several times, but she’d rebuffed him. That weekend, he stopped by her place and waited outside the apartment until she left to go shopping. She snapped at him when she saw him: “Please just leave. I can’t talk!”

Adam is shaken. He’s known Anne since college. She is extremely talented and honest. He can’t imagine she’d violate the terms of the NDA or do anything that could harm AlgoConsult. Was it possible that someone else had manipulated the rating system, and she reported it? Should he take the risk of telling his boss about his observations? Was it worth putting his top-notch salary on the line? As he climbs into his car, there’s a re-broadcast of the local radio station’s economic program airing. The press secretary for an investor group is being interviewed, and he says they are developing new AI-based software designed to help small investors find good investment opportunities and help them make investment decisions.

What should Achim do?

Questions:

  1. Is there any reason to justify Achim’s eavesdropping on Martin’s conversation?

  2. How can Adam be sure that Martin was talking on the phone to PFC? Does that even matter in this scenario?

  3. Is it okay for Achim to use changes to the program to examine Martin’s work more closely? How far should this kind of “double-checking” go?

  4. What are we to make of the fact that Martin generated so many changes and additions to the internal program documentation to “conceal” the change he made to the calculation? Is it possible that Martin used the copy-paste function for comments he made to the changes? How reliable are these kinds of comments? How important is it to document programming changes precisely?

  5. Is it safe for Achim to assume this change was made deliberately to give PFC an advantage?

  6. Is it possible that the boss knows about Marin’s changes or that he instructed him to make them?

  7. Should it matter to Achim that Anne has been fired and no longer wants to talk about it?

  8. As a general rule of thumb, is it advisable to work on software systems subject to non-disclosure agreements?

  9. Rating and matching algorithms can constitute a company’s core business and central asset. These algorithms and influencing factors are, therefore, often kept secret. This is understandable from an economic point of view, but how dependent do users become on the algorithms? What opportunities for manipulation arise? What dangers can arise not only for the individual user but also for economic systems or societies?

Published in Informatik-Spektrum, 39 (3) 2016, pp. 247–248

Translated from German by Lillian M. Banks

Scenario: The Nightmare

Christina B. Class & Debora Weber-Wulff

Andrea, Jens, and Richard are sitting at their favorite pizza joint in Hamburg, celebrating the successful completion of software tests for their “SafeCar”-project with beer and pizza. They can take the project to the DMV in Hamburg tomorrow for inspection, and with that—they’re done. They won’t have to pull an all-nighter.

They work for SmartSW GmbH in Hamburg, where they are tasked with developing a software control system for the safety component on the new car model KLU21 produced by German automaker ABC.

ABC has come under increased economic pressure of late because of its inordinate focus on internal combustion engines over electric motors. Emissions scandals surrounding diesel engines in recent years have made things increasingly difficult for the company.

KLU21 is the company’s latest energy-efficient hybrid model that will be equipped with a new intelligent driving control system and an enhanced safety system. This new intelligent driver-assistance feature will provide drivers with real-time data. Using the latest communications technologies, ABC is hoping to improve its tarnished reputation as it re-brands itself as a state-of-the-art company and a pioneer in the field. It’s hoping this will help win back some of the market share it lost and hold on to some jobs.

As part of the SafeCar project, a safety system for KLU21 with artificial intelligence was developed. The software prevents the car from drifting out of its lane using information from external cameras linked to the vehicle’s location, current traffic data on road conditions, and satellite data loaded in real-time from a secure cloud. The vehicle is also aware of other cars on the road and is programmed to maintain a safe driving distance.

At the same time, a camera aimed at the driver detects when the driver’s eyes are closed, which could mean that they have nodded off or lost consciousness. An alarm will sound to wake the driver. If the system does not register an immediate reaction, the autopilot takes over and brings the vehicle to a stop at the side of the road, taking into account road conditions and traffic. To minimize the risk to all parties involved in any traffic event, the maximum driver reaction time before the software takes control of the vehicle depends on site-specific road conditions and traffic.

Andrea, Jens, and Richard suggested to the project director that the software should also detect when a driver is fiddling with their cell phone and send and display a warning on the dash and an audible warning signal. Jens and Richard took it a step further to suggest that the software detect when a driver is applying lipstick and the car should whistle, though not every time, only every sixth time or so. They were just kidding, and this feature was not included.

It was easy to send the images to the cloud cluster over the 5G network, where they could be broken down to see if they could recognize anything. However, detecting drivers nodding off and losing consciousness as accurately as possible requires that the program be trained using mountains of data specifically designed for that purpose. Recognizing someone using a cell phone or lipstick involved simple finger exercises, and ample “training data“ (images and videos) was readily available at no cost on the Internet.

In the event of a mechanical breakdown, or if the driver loses consciousness or is involved in an accident, an emergency call is automatically placed to the police with the relevant data about the car, the type of breakdown or accident, whether an ambulance is needed, and the vehicle’s exact location.

After all the overtime they’ve put in, Andrea, Jens, and Richard are having a good time celebrating the successful completion of the tests. After many successful in-house tests, they’ve spent the last two days driving through city streets and the greater metropolitan area with a system installed on the passenger side, logging all the information and commands from the system. Results show that the system accurately recognized every situation it encountered. Their boss has already congratulated them on a job well done. They’re hyped and sit around chatting until well past midnight.

After cabbing it home, Andrea is suddenly thirsty, so she plops down on the couch with a bottle of mineral water and turns on the TV. There’s a rerun of a talk show airing a segment about the current status of the 5G network expansion in Germany—a topic that’s been in the news for months. Once again, the discussion turns to how many dead spots in rural areas have yet to receive coverage. To top things off, rules to regulate national roaming fees remain inadequate.

One corporate executive claims that once the 5G rollout is complete, there will only be a few dead spots remaining. But a viewer vehemently insists that even with the new network, she only has reception in her garden. Her new neighbor has a different network provider but has to go to the cemetery at the edge of town because there is no roaming mandate. A heated discussion ensues….

Andrea is exhausted. She leans back. Her eyes fall shut … She’s driving a car along a winding road in foggy weather. Visibility is next to zero. She’s glad she’s driving a KLU21; it helps her to stay in the right-hand lane. But suddenly she notices that the car’s steering assistance has stopped working. She is unprepared for this, and before she can respond, her car leaves the road and crashes sideways into a tree. She is thrown forward and hits the airbag at an awkward angle. She passes out.

She can see herself sitting unconscious in her car. Help is on the way; luckily, KLU21 has placed an emergency call. She looks around the car: the emergency light isn’t blinking; why hasn’t an emergency call been made? She’s bleeding! Why isn’t anyone coming to help? Suddenly, she knows! She’s in a not spot … The steering assistance can’t function as it should without real-time data, nor can the emergency call center be notified …

“Hello? Can anybody hear me? Why aren’t there any cars on the road here? Why isn’t anyone coming to help? Help, I’m bleeding! I’m bleeding to death!”

Andrea wakes up in a cold sweat! It takes her a moment to calm down and realize that it was just a dream, that she is at home in her apartment, and that she is okay.

Then she remembered today’s software tests in and around the city and the talk show with those complaints about inadequate network coverage in some rural areas. Roads run through those areas, too. Why didn’t it occur to them while they were developing the software for KLU21? Why didn’t they ever bring it up with their client? Her aunt lives in Sankt Peter-Neustadt and has often complained about the poor connectivity. She needs to talk to her boss …

Questions:

  1. Designing tests is very difficult. In this example, the developers were supposedly heavily involved in the tests. Is this justified? What guarantee do we have that the tests won’t be manipulated to produce inaccurate results? How strictly should test parameters be required to be directly dependent on the safety relevance of the software? How heavily do production deadlines and economic pressures influence the implementation of software tests?

  2. No amount of testing can guarantee that any software product is free from error. It’s even harder to prove that the software won’t do something it hasn’t been asked to do. In this scenario, a so-called “Easter egg” was to be included—the lipstick recognition feature—but not activated. What dangers might be involved in this kind of “Easter egg”? Does it matter if the developers have programmed the Easter egg during their “free time”?

  3. This scenario involves a software application that is critical to the safety system and was trained using training data. How can the quality of training data be guaranteed? What guidelines should/could be set in this regard? When and under what conditions should a system be forced to be re-tested using new data? What form should such new testing take once the system has been fed new training data? Do all the tests that have previously run need to be repeated? Should the same people responsible for defining the training data also be involved in developing tests? If not, why not?

  4. The internet is filled with images and videos that are “free”? Can these materials be used to generate training data? If not, why not? If so, should any restrictions be placed on the types of things they can be used for?

  5. Open data (readily accessible, application-specific data sets) exist for specific applications, such as data mining and AI contests. Is it permissible to use these data sets for other purposes/research questions? What requirements must be placed on the documentation of these datasets, specifically concerning data source, data selection, and conclusions drawn? How great is the likelihood that false conclusions will be drawn when using these data types?

  6. In current discussions about 5G, there is often talk about the need for nationwide coverage and national roaming—how important is this? Is it reasonable to create applications that are critical to safety before there is any guarantee that infrastructure is available nationwide? Would it be reasonable to write applications strictly for specific locations—for use in the city, for example? What kind of compensation is due if a user relocates and can no longer use the system?

  7. Many areas today are already adversely impacted by “not spots” (areas with no reception). For example, in places where rescue teams or police can’t be reached by cell phone or emergency services cannot contact the call center, to what extent is political intervention needed to force regulation? How great is the digital divide in Germany? As an industrialized nation, can we afford to let this stand or even worsen, especially now with the expansion of the 5G network?

Published in Informatik Spektrum, 42(3), 2019, S. 215-217, DOI: 10.1007/s00287-019-01171-4

Translated from German by Lillian M. Banks

Scenario: Analog/Digital-Divide

Debora Weber-Wulff & Stefan Ullrich

Matthias and Melanie have known each other since college. While Matthias was studying medicine, Melanie was studying computer science. They met while working on university committees and soon moved in together. After graduation, Matthias took a job in public health for altruistic reasons. Melanie has an excellent job as a software engineer, so money is not a problem for them.

Since the start of the Corona virus, Melanie has been working from home, but Matthias has increasingly been called into the office to work. With all the attendant paperwork, the sheer volume of cases is growing exponentially. Matthias does his best, but many things just don’t get done. He’s totally frustrated by the slow pace of everything. You’d think people would figure out pretty quickly whether or not they were infected. He’s even been known to help out the medical and technical assistants and spends several hours at a time clad in protective gear collecting samples. He and the whole team must be meticulous in filling out the forms to be legible. And it must be done by hand in blue ink—only the QR code on the sticker they affix to the top is digitized. The samples are sent to various laboratories, but it takes days, sometimes weeks, before results are processed and the chain of infection can be traced.

One evening, while sitting on the balcony sharing a glass of wine, Matthias complains about how slow everything is going. Melanie needs help understanding the problem: why does everything take so long? The testing only takes a few hours to complete, and labs are now working three shifts daily.

“What’s the problem?” she asks.

“Unfortunately, I can’t tell you exactly what steps are involved because everything we do at the office is strictly confidential.” His department head has repeatedly told him: “What happens at the office stays at the office.”

“Don’t be silly,” says Melanie. “I don’t care who’s been tested or what the results are. I’m just trying to get behind what’s up with your office and why it takes so long to get results. Process optimization is my specialty. Maybe I can help.” Matthias pours himself another glass of wine and starts talking. “Don’t laugh,” he says, “and please don’t tweet this out. Each lab has its own paperwork to complete for orders and results. We rotate labs regularly to distribute the workload more evenly. And since we don’t have the equipment to do anything digitally on-site, we manually fill out all the forms—hand-written in blue ink; that way, the administration can more easily distinguish between the original and black-and-white copies. Samples and forms are placed in a post box, and when it’s full, a courier comes to pick up the samples and forms and deliver them to the lab.”

“Well, that’s a good stopgap solution. Of course, digitizing everything from the get-go would be better, but okay. That takes time. What does the lab do with the results?” asks Melanie.

“They type the data from the requisition form into their system for tracking test results and billing, then enter the results and fax them back to us here at the health department.”

“They FAX them!?” Melanie snorts. “Ooooo-kay, then. It sounds like we’re getting closer to the bottom of it!” she chuckles.

“It’s not as dumb as it sounds. We deal with confidential personal data that shouldn’t be sent online.”

“True, but you can overdo it. I can see how no one in your office would be able to handle setting up an encryption infrastructure, especially not one designed for external communications. But still, it’s all good—you should still be able to get the results the next day … by fax.”

“No, it’s not that simple. The computer we use to receive faxes saves them with the filename ‘Telefax.pdf.’ At least they’re numbered sequentially, but we still have several thousand files with the same name—a thousand each day! So, someone has to sit there renaming the files so we can file them in the proper folders. They have to open each file, check its reference number, see whether it’s positive or negative, and then save the files in different directories, depending on the result. You can imagine that sometimes mistakes are made with the manual renaming of the files or in sorting, and then we have to contact the lab to fix it. Only now that we’re processing so many tests daily have we realized how much of a time-suck manual data processing is. Here,” he pauses for a moment, “everything’s falling apart.”

Melanie can scarcely believe her ears. “You can’t be serious! What kind of computer are you using, and with what software?”

“I don’t know exactly, but we can’t connect it to the internet because its operating system is so old. But it’s good enough for receiving faxes—just not when we’re not in the middle of a pandemic!” Matthias replies.

Melanie shakes her head. “Why don’t you just go to MegaMarkt and pick up a fax machine? The new ones can send files straight to your PCs, even if you work from home. And pick up some OCR software while you’re at it. That way, you can read the results and file them where they belong, all at the same time! Unlike humans, machines don’t make mistakes!”

“Great, but we aren’t allowed to use such equipment. Everything has to go through procurement and the IT department—it must be some data protection thing—I don’t know what it’s called. Then, there’s the bit about protecting patient health information. And everyone has to go through the necessary training. It takes at least six months—if not more—for anything like that to go through!”

Melanie gulps down another big swig of wine and says: “Tell you what. We will buy one of those machines, and I’ll set it up over the weekend. My fees are too high for you to pay, but I’d be willing to do it for free because I want these tests done faster! You have to get tested regularly, too, and I want you to know whether you’ve been infected as soon as possible. What do you say?”

Matthias is unsure whether he should accept the offer. In Melanie’s telling, it is tempting and sounds easy enough.

Questions:

1. Is it necessary to keep standard operating procedures at the health department confidential?

2. Might there be a moral reason to violate confidentiality under certain circumstances?

3. Was it okay for Matthias to tell Melanie what he did about the office procedures at the health department?

4. Is there anything wrong with the laboratory faxing the results to the health department?

5. Morally, would it make a difference if the OCR software made a mistake instead of a human mislabeling something?

6. Is Melanie being too careless with networked devices?

7. Matthias has voluntarily committed to working extra hours at the expense of family time together. How should this be assessed in moral terms?

8. In most cities, the lab only notifies the health department about positive cases. Does this change your assessment of the situation?

Published in Informatik Spektrum 43 (5), 2020, S. 352–353, doi: https://doi.org/10.1007/s00287-020-01308-w

Translated from German by Lillian M. Banks

Scenario: But the Bot Said…

Constanze Kurz & Debora Weber-Wulff

Chris and Rose work in a robotics team at a mid-sized toy manufacturer. In keeping with contemporary trends, the company has been expanding its online electronic gaming capacities for the past several years. Chris and Rose belong to a small group of employees who design and construct animated stuffed toys that are explicitly marketed—though not exclusively—to children.

Most animals resemble caterpillars or worms because it’s easier and more efficient to build robots for autonomous movement that way. At the same time, it reduces the risk of injury in small children. They became a commercial hit not only because they were soft and cuddly but also because they were interactive and could talk and sing. As an added feature, a built-in acoustic monitoring function connects to a smartphone—the parents’ phone, for example. Whenever you leave the room, the stuffed animals act as low-profile babycams.

Currently, Chris is working on a new version of an animated caterpillar whose software will feature new forms of interaction. Parents will be able to upload an ever-increasing number of educational games, quizzes, and puzzles from their computers or smartphones. Adaptive speech recognition specially tailored to children will process answers entered using large buttons attached to the caterpillar’s body.

Rose tests each new version of the robotic caterpillars. Her focus is on safety—that is, to guarantee that none of the caterpillars‘ motions pose any danger. They must not crawl too quickly. They can sense when they are being picked up, so their movements adapt accordingly. The results are exceptional: there’s nothing remotely dangerous about the product. Children in the test group could scarcely take their hands off the colorful stuffed animals.

But Rose discovered a problem that she didn’t tell Chris or her boss about at first, but instead, she shrugged it off as a quirk. The software spewed out nonsense in a series of new puzzle games instead of giving the correct answers. Rose chuckled the first time she heard one of the children say that a penguin wasn’t a bird but rather a breed of dog. When she asks the girl about it, the kid insists: “But that’s what ‘Wurmi’ said.” The little girl is devastated when Rose tells her that’s flat-out wrong.

But as the number of incorrect answers mounts, Rose puts the problem on the team meeting agenda and asks who is responsible for fact-checking the robots’ answers. Ample research has long since established that children place a lot of trust in their electronic pals.

Chris is slightly annoyed and responds that they had sought a software provider certified for creating children’s toys. They had developed an artificial intelligence program specifically tailored to generate hundreds of quiz questions that have also been automatically translated into scores of different languages. So there’s nothing they can do about it now. What did Rose expect them to do? Go through and listen to every single quiz question individually? Not a chance!

Rose fires back: “But the parents can load updates to their phones, so making corrections shouldn’t be a problem.” Chris argues that the game software isn’t even their area of expertise—“we’re just supposed to make the hardware for these robots and tend to the locomotion—making sure the robots are programmed to make the right movements.”

Rose is confounded by the degree of ignorance—after all, they were dealing with tender-age children. So, she takes another stab at intervention. Rose’s boss, Anne, announces that they’ll check the games out to nip the ensuing debate in the bud. Rose has no idea what that means—only that the topic has been shelved.

So Rose decided to look into this certified software company for developing children’s games. She wants to know how the questions are generated. At a meetup, she befriends Henri, a guy who works for the company. Henri has no qualms telling her they don’t use AI to generate the questions but instead use an open-source knowledge base.

Rose looks into it and is shocked to discover that someone has made an entry stating that penguins are a dog breed. Anyone can enter whatever nonsense they like, and no one checks the content for accuracy. On a whim, she decides to change the superclass from “dog” to “cat,” knowing that the software will be updated the following week. Let’s see if the change will stick.

The next time this question comes up—there it is: Wurmi spits out “cat” as the correct answer. What should Rose do? Wurmi’s sales are through the roof, and the team is already working on the next project.

Questions:

  1. Is there an ethical issue with Chris’s job of manufacturing and programming a children’s robot whose software is delivered by a third party? Can you distinguish between the children’s robot and its gaming software?
  2. Does it matter that Chris doesn’t even know precisely what kind of software will be installed for his robots? Is he obligated to find out more about it?
  3. Is it ethical to neglect to consider the inexperience and naivete of young children?
  4. Should Rose have reported the problem immediately involving incorrect answers? She initially notes it as a quirk but only follows up later. Is that a problem?
  5. Is Rose out of line for asking questions and pressuring the team? After all, the software is entirely out of her lane.
  6. Shouldn’t Rose have taken Anne at her word? Was it okay for her to have done her own research?
  7. Was it okay for Rose to make friends with Henri to extract information about his company?
  8. Wouldn’t it have been better if Rose had at least entered “Bird” instead of “Cat” into the knowledge base? As it stands, she only succeeded in allowing the nonsense to continue.
  9. Shouldn’t open-source knowledge bases check their contents for accuracy? Is that even possible?
  10. Should special care be applied to systems designed for children? Could it be that inaccurate information can have a sort of “negative formative effect”?
  11. Other media for children have editors who are responsible for fact-checking content. Shouldn’t this also be the case here? We wouldn’t entrust the content of children’s TV programming and textbooks to some anonymous source, why would we allow that in this case?
  12. How can the quality of educational toys/games be regulated? Should the hardware—like this caterpillar bot—be equipped with open ports to allow everyone to load their own educational programs? Children also develop “emotional attachments” to their stuffed animals and toy robots. How great is the risk that children might be indoctrinated to hold racist beliefs, for example, or conspiracy theories?
  13. Adding a supplemental audio monitoring function to work with a smartphone may seem practical aid for childcare and risk prevention in the home. But when children take the stuffed animal-bot-caterpillar to a daycare center or friends’ homes, it can quickly turn into a listening device that is not recognizable as such, as, for instance, a baby phone would be. How should this conflict be handled?

Published in Informatik Spektrum 45 (2), 2022, S. 121–122, doi: https://doi.org/10.1007/s00287-022-01441-8

Translated from German by Lillian M. Banks

Scenario: What is true? Data, Graphics and Truths

Christina B. Class, Andreas Hütig & Elske M. Schönhals

Andrea, Alex, and Sascha come from the same small town and have been close friends since elementary school. After high school, they moved to far-flung parts of the country. All the more reason for them to relish their annual get-togethers on December 22, when they would sit around talking for hours on end. For the past few years, nothing ever stood in the way of their annual get-together: neither their various study abroad trips, nor their jobs, nor family. December 22 was reserved for old friends and Christmas Eve for parents: they wouldn’t miss it for a thing, not even the Corona virus. They’re seated at a safe distance from one another, having a beer in Sascha’s parents’ living room: Sascha is at the dining room table near the kitchen, Alex is on the sofa across the room, and Andrea has made herself comfortable in the armchair beside the fireplace.

Andrea has just started showing the others her latest project. She and a fellow student, Maren, are working on a data visualization app. They’ve gone to great lengths to develop as their unique selling point (USP) a user interface that will appeal to users who prefer not to deal with data or data visualization. No prior knowledge of programming or statistics is required; every trace of code remains hidden from view. The user selects their preferred filters, and the app allows them to present the data in various visual formats.

The user can get creative with charts, colors, ratios—even 3D graphics—to display data. The way the data is represented is easily changed or adapted to suit user needs. The goal is to allow the user to create graphics quickly and simply so they can either be sent to a computer or shared directly in social media networks using share buttons. After all, the ability to back up specific themes and theses with suitable statistics and infographics is becoming ever more imperative.

Sascha, who works for a consulting firm, tests the app on Andrea’s tablet and is thrilled: “Man-o-man, Andi! Why didn’t you have this thing ready a couple of weeks ago?! We had to put together an interim report for one of our clients and needed to compile all the data from our market analysis to support our strategic recommendations. Man, that was a lot of work! And Tommy, the project director, was impossible to please. The graphics never illustrated what he was trying to communicate quite how he wanted. It was such a pain fiddling with all those options and parameters.”

“Yeah, Sasch’,” Andrea answered with a grin, “we don’t make quite as much as you! Otherwise, we could hire a few more people and knock this stuff out more quickly.” Sascha hands Alex the tablet so he can look at it, too. Alex is as fascinated as Sascha. While Andrea and Sascha discuss the app’s market potential, Alex is thoroughly engrossed in testing its many functions. But the look on his face gradually turns sour. He furrows his brow like they’ve often joked about whenever his thoughts wander off the deep end.

Suddenly, Sascha turns to him and asks: “Hey, what’s up? Is something wrong?” Alex looks up, stares straight at Andrea, and says: “I don’t know. I have a bad feeling about this app, Andi. It runs like a charm and simplifies everything. The graphics look super professional and persuasive. But isn’t it almost too good? I can play around with all these options and snippets long enough to use the same data to create graphics that lead to opposite conclusions. That can’t be good.”

“Why not? That’s precisely the point,” Sascha says. “You have no idea how much effort goes into configuring graphics to illustrate precisely what you want them to. That’s what’s so brilliant about it: the user interface is so streamlined that it no longer requires specific skills to generate the graphics you need. You have to know what the graphics are supposed to show—the app does the rest for you.”

“Yeah, but that means that the graphics might end up showing something that’s not true; it might even mean that the data can be manipulated to extract insights that aren’t necessarily true.”

“Nonsense!” Andrea continues, “We don’t delete or change any of the data. And besides,” she adds, “the whole purpose is to make certain facts stand out. You know what they say: a picture’s worth a thousand words.”

“Yeah, right!” Alex barks back derisively, “Typical consultant! Just what I thought you’d say!”

There’s a minute of dead silence before Andrea asks, “Hey Alex, what’s the deal? What’s that about?! Sasch’ hasn’t done anything wrong.”

Alex takes a deep breath, “Yeah, I know. I’m sorry, Sascha, I didn’t mean it that way. It’s just that I’m genuinely ticked about this. Remember two years ago when Micha and I opened an escape room and an adventure pub? It was going well. Until. Well, you know. Until Covid came along. We tried to stay afloat with online offerings. Even that was getting off to a good start; then some folks managed to steal our ideas… what ya gonna do?…we had to devise a new plan.”

Then we got an offer for a new VR space. It sounded great. We installed a test version and spent three weeks testing for performance and quality using various subscription-free experiences. It was all very promising. So we signed the contract. But these *$@!’s had presented the data in a way that glossed over all the problems. They were either tucked away in corners with tiny graphics or smoothed over on a compensation curve. We didn’t pay much attention to it during the presentation. The system runs like crap, and we’re likely to lose our shirts over it. We’ve already seen an attorney. But since they didn’t falsify the data, they aren’t guilty of fraud. If jerks like this get their hands on an app that can do all this, it’s game over.”

Andrea and Sascha stand there staring at each other in silence.

Finally, Sascha says, “Dude, I’m so sorry to hear that! Unfortunately, bad apples are a dime a dozen—even in our field. But there’s nothing Andrea’s great app can do about that, is there? Ultimately, it’s your job to crunch the numbers and do the math, no matter how good the graphics look. Next time, why don’t you let me look at them before you sign anything?”

Andrea agrees, “Sure, if you feed the right data into our app, you can get it to show you things that are ultimately misleading, or that look differently taken out of context. But anything can be used for nefarious purposes, right? You can’t put that back on our app.“

But Alex wonders, “Aren’t you taking the easy way out here? Remember that not everyone has had as much statistics background in their education. All these numbers and fancy graphics make everything look so much more convincing, yet what they represent is only a fraction of the bigger picture. And ultimately, no one can make sense of it anymore—not even the app users! Where’s the accountability?”

Questions:

  1. Sascha thinks a picture is worth a thousand words. At the same time, though, essential details often get lost in translation. Have we all grown accustomed to taking in everything with just one look? Why do we prefer to see graphics and images over numbers and data? Are we still willing to engage with the details behind the numbers?
  2. The app promises to simplify data visualization. What practical applications might this have beyond pretty pictures (and marketing campaigns)?
  3. Andrea and Maren’s app also allows users to export graphics to social networks, which is precisely where myriad half-truths, fake news, and falsified numbers circulate. Most dangerous are false and/or distorted statements based on accurate but incorrectly interpreted data. Would an infographic app like this tend to accelerate this trend or counteract it? What changes to the app could Andrea and Maren make to help support substantive content instead of simply rendering rote speculation more plausible?
  4. In 2014, Lisa Zilinski and Megan Nelson published a “Data Credibility Checklist [1]. What might the minimal prerequisites for using data to construct graphics entail?
  5. What criteria must a graphic meet for you to trust it? Where should the data come from? What should be taken into account? What tracking or verification options would you like to have?
  6. What are the implications of these checklist items for data graphics creators? Who is responsible for ensuring that graphics are interpreted correctly?
  7. On its face, accountability is informed mainly by a sense of agency. Someone is accountable to someone else for something adjudicated by a particular authority according to an agreed-upon norm. But what about this instance, where the programmers cannot know what the users may do with the app they created? Can you be called to account for something you do not know might happen? Or should they be required to at least minimize the likelihood of misuse or make it more difficult? If so, how might Andrea and Maren go about achieving that end?
  8. If accountability can no longer be traced to any given “agents,” would one solution be implementing regulation at the system design level? Or are those types of interventions ineffective and fundamentally overreaching?

References:

Börner K, Bueckle A, Ginda M (2019) Data visualization literacy: definitions, conceptual frameworks, exercises, and assessments. Proc Natl Acad Sci USA 116(6):1857–1864. https://doi.org/10.1073/pnas.1807180116.

Zilinski LD, Nelson MS (2014) Thinking critically about data consumption: creating the data credibility checklist. Proc Am Soc Inf Sci Technol 51(1):1–4.

Published in Informatik Spektrum 44 (1), 2021, S. 62–64, doi: https://doi.org/10.1007/s00287-021-01337-z

Translated from German by Lillian M. Banks

Scenario: Developing Software with your AI Assistant

Christina B. Class, Otto Obert & Rainer Rehak

Are you looking for a little help from AI? These days, many software developers are doing just that. But how much can you trust generative tools? The following scenario illustrates how important it is to pose this question early on.

Three weeks ago, André was hired as a developer by Smart4All, a small firm specializing in custom software solutions for small to mid-sized companies. Recently, there has been an increased demand for AI-based services, both internally and externally, and André has been assigned to work in this area. He’s a newly minted BA who did reasonably well as a business and information technology major. At the moment, there is no shortage of IT job offerings. And yet, here he is—a guy whose strong suit wasn’t exactly statistics, programming, and AI—working in an IT department. Suffice it to say that while in school, he profited greatly from collaborations with his fellow students, especially in these areas.

He got lucky with his bachelor’s thesis: he completed the work at a mid-sized company where his job was to evaluate the potential for data mining and AI to minimize costs and optimize the preparation of proposals. After researching various blogs and on the code-sharing platform CoDev, André could find most of the code he needed. Then came the first version of Easy-AI-Code-Pilot—a tool for automatically generating code that was first introduced on CoDev. While he was skeptical at first, before long, he no longer had any qualms about using it all the time. It was no small task to get enough of a grip on the individual fragments to combine them to do what he wanted. At the time, the company was happy with his work and wrote him a glowing letter of recommendation.

Now, André is sitting at his new desk, staring out the window, when the team leader, Verena, comes in, smiles, and says she has an important assignment for him. BioRetail, a major nationwide distributor of organic products, has contracted Smart4All to develop new software solutions to integrate existing in-house programs for customer management, accounting, ordering, and warehousing. The client wants a solution to forecast incoming orders in a B2B format. There was data to prepare, processes to test, and everything to be documented in Python Notebooks…the usual. Verena flatters him—“That’s right up your alley!” she says. He subtly hints that these aren’t exactly his core competencies and that, while completing his bachelor’s degree, he’d relied primarily on such resources as blog entries and mainly used CoDev and Easy-AI-Code-Pilot to generate code. Verena grins at him and says, “That’s what everyone’s doing nowadays.”

André’s concerns thus fade, and he gets to work. Understanding and cleaning the data is no small feat for him, but he finds a snippet of code from a hackathon that involved scrubbing very similar types of data. He uses various code snippets culled from the internet that apply different cleaning methods. He tests the code snippets and evaluates them on the basis of the usual quality-control criteria. Easy-AI-Code-Pilot offers good suggestions for small subtasks, but André struggles to integrate all these different pieces of code. Even though there are times that he’s not quite sure of himself, in the end, everything looks plausible and consistent enough. However, he cannot guarantee that he may not have overlooked or incorrectly assigned one thing or another in all the data and code fragments. Nor did he adhere to any strict separation between training, validation and testing data sets. Time is ticking, and he brushes aside his creeping doubt because the results look convincing. He had, after all, run tests on various models using multiple hyperparameters for each one, and he correctly documented everything in Python Notebook files. He may not have come up with the perfect solution, but André is confident that the result is not too shabby.

Three weeks later, Verona and André are called in by Frederic, the IT product director and account executive for BioRetail. When they enter the room, they see Geraldine—the representative from BioRetail. The atmosphere is cold, and Frederic asks everyone to have a seat. Then Geraldine begins: at first, she was enthusiastic about the prognostic notebooks, but when she tried analyzing more recent data, it was spitting out results that made no sense whatsoever. Since the new data looked slightly different, she wanted to adjust the notebooks herself. That’s when she noticed that different program components were all based on different features. Then she saw even more inconsistencies, so she sat down with her IT people and reviewed everything. She couldn’t believe her eyes. The code was awful; it had zero homogeneity, and the data models were way too different. The documentation was atrocious. Everything was a hodgepodge slapped together from various methods that couldn’t function reliably. It was totally unacceptable, and any company capable of delivering such a slipshod product was certainly in no position to integrate multiple programs. Verena and Frederic exchanged glances before they looked at André and said: “So, you’ve got to have some explanation for this, right?”

Questions:

  • What is the value of using such tools as Easy-AI-Code-Pilot to generate code automatically?

  • What basic principles should be followed when using these assistance systems?

  • Should André have been more diligent about telling his team leader he wasn’t qualified to take on the assignment?

  • What was Verena’s role in this? As team leader, what should she have done better? Who is most at fault for the whole fiasco?

  • Does the platform provider CoDev have any responsibility for making a product like Easy-AI-Code-Pilot accessible free of charge? Is it enough for CoDev to display a text warning about potential product misuse?

  • What steps should providers of these kinds of tools take to live up to their obligations?

  • What do you think about Verena humiliating André in front of the IT product director and the BioRetail representative? What ethical principles should apply to management personnel in this situation?

  • Is it even possible to implement ethical principles in AI? What are the implications for the way we deal with AI? What might a code of ethics look like?

Published in .inf 05. Das Informatik-Magazin, Frühjahr 2024, https://inf.gi.de/05/gewissensbits-softwareentwicklung-mit-kollege-ki.

Translated from German by Lillian M. Banks

Scenario: Statistical Aberrations

Christina B. Class & Stefan Ullrich

A little over a year ago, Alex completed his master’s thesis on artificial intelligence and facial recognition. His customizable, self-learning method substantially improved previous results for real-time facial recognition. Last year, after he presented his paper at a conference—including a proof-of-concept live on stage—he was approached by the head of AI Research and Development at EmbraceTheFuture GmbH. The company was founded three years ago to specialize in the development of custom software systems, especially in the field of intelligence systems and security systems. After graduation, Alex took a short vacation and accepted a position working for EmbraceTheFuture GmbH.

He’s currently working in a small team to develop facial recognition software for a new security system called “QuickPicScan” that will be used at airports by the German Federal Police. The faces of passengers at security checkpoints will be compared in real-time with mugshots of fugitives so that suspicious individuals can be singled out and subjected to more intense scrutiny. Authorities hope that this will allow them to identify passengers with warrants within the Schengen area, where there are no passport controls at the borders.

It’s also designed to accelerate the rate at which people are processed through security checkpoints. The system was trained using millions of images. Mugshots and images of criminal suspects are stored in a database that is accessed and updated anytime a new image is captured so the system can easily be kept up-to-date with the most recent search warrants. At the airport, low-resolution photos of all passengers are taken as soon as they pass through security.

Whenever the software detects a match, the metal detector is triggered to sound the same alarm used when it detects metal. However, while the passenger is subject only to the routine search, a high-resolution photo is snapped under improved lighting. That image is again run through the system for potential matching. It isn’t until this second test produces a positive result that the passenger is taken aside and subjected to a more thorough search in a separate room where particulars are compared. The results of the second test are displayed on a control terminal. The photos of the passengers are not saved—there’s a separate team assigned to guarantee that these photos are deleted from the main memory and cannot be accessed externally. QuickPicScan was tested extensively in simulations and with actors in a studio set-up staged to replicate the security checkpoint.

Based on these tests, the team estimates a false negative rate of 1%. For every 100 people targeted for closer scrutiny, only one goes undetected. The false positive rate—the number of people incorrectly classified as suspicious—is less than 0.1%. Marketing director Sabine is delighted with these results. A margin of error of 0.1% for falsely targeted innocent subjects—that’s spectacular!

To test the system in real-world conditions, the company is coordinating with the police to conduct test runs for two months in the summer at a small airport—one that serves approximately 400,000 passengers per year. One of the client’s employees monitors the control terminal. Three hundred seventy actors were taken in “Mugshots” of varying degrees of quality and in various poses and fed into the system.

During the two-month testing period, the actors pass through the security checkpoint 1,500 times at previously determined randomly selected times. After passing through the checkpoint, they identify themselves at the control terminal so the system can be tested. Since the two-month period falls within the summer vacation, only 163,847 passengers are checked. The lamp incorrectly identifies 183 passengers as suspicious. Eight of the 1,500 security checks logged by actors failed to recognize the match.

Project manager Viktor is thrilled. While the false positive rate of 0.11% was slightly higher than initially hoped, the false negative rate of 0.53% was substantially lower than anticipated. EmbraceTheFuture GmbH goes to press with these numbers and a margin of error of 0.11%. The police announced that the system will soon be operational at a terminal in a major airport.

That evening, Alex gets together with one of his old school friends, Vera, who happens to be in town. She is a history and math teacher. After Vera has brought Alex up to speed on the latest developments in her life and love interests, he gushes to her about his project and tells her about the press conference. Vera’s reaction is rather critical—she’s not keen on automatic facial recognition. They’d often gotten into this while he completed his master’s degree. Alex is thrilled to tell her about how low the margins of error are, about the increased security and the potential for ferreting out individuals who’ve gone into hiding. Vera looks at him skeptically. She doesn’t consider the margin of error low. .11%? At a large airport, dozens of people will be singled out for closer inspection. And that is no laughing matter, in her view.

She also wonders how many people who’ve had their mugshots taken will likely be boarding a plane. But Alex doesn’t want to hear about it and goes on a tangent outlining details about the algorithm he developed as part of his master’s thesis…

A few months later, the system was installed at AirportCityTerminal. Security officials were trained to use it, and the press reported a successful launch. A couple of days later, Alex flies out of AirportCityTerminal. He’s already looking forward to passing through his QuickPicScan—basking in the knowledge that he has contributed to improving security. But the metal detector starts beeping no sooner than he goes through the security gate. He’s asked to stretch out his arms and place his feet on a stool—one after the other—all while staring straight ahead. He peers at the security guard’s screen to his right and sees the tiny light of the QuickPicScan monitor blinking. Let’s hope this doesn’t take long—he’s cutting it close with his flight. They won’t wait for him since he hasn’t checked any bags, and he can’t afford to miss this flight. He’s taken to a separate room and asked to keep his papers ready while he stands there opposite a security guard. Alex tries to give the guy his passport, but the guard tells him to wait—he’s not the one in charge, and his colleague will be by shortly to take care of it. Alex is growing impatient.

He asks them to confirm his identity and is told no—it can’t be done because the officer on duty doesn’t have access credentials for the new system. It takes a full eight minutes for the right person to show up. Once his identity has been confirmed, it’s clear that Alex is not a wanted fugitive.

But his bags are nevertheless subject to meticulous search. “It’s protocol,” the woman in charge tells him. Alex is getting antsy. He’s probably going to miss his flight. Suddenly, he’s reminded of the conversation he had with Vera.

“Does this happen a lot?” he asks, feigning politeness.

“A couple dozen a day, I suppose,” she says as she walks him back to the terminal.

Questions:

  1. Alex was falsely identified as a “suspect” and missed his flight. This is referred to as a “false positive.” How much collateral damage from “false positives” will we take in stride? What kinds of fallout can falsely identified people be expected to accept? How would compensation for such instances be regulated?

  2. People make mistakes, too. Under similar circumstances, it’s possible that Alex was singled out for closer inspection by a security agent. In principle, does it really make a difference whether it’s human error or machine error?

  3. People are prejudiced. For example, it’s well known that men who appear to be foreigners are checked more frequently. What are the chances that software systems will reduce this type of discrimination?

  4. Self-learning algorithms require training data, so their results are heavily dependent on training data. This could lead to built-in discrimination manifested in the algorithm itself.

  5. It’s also conceivable, for example, that facial recognition for certain groups of people is less precise because fewer images of them are available in the training data. This may involve anything from skin color to age, gender, facial hair, etc. A system like the one presented here could lead to an excessive number of people with certain physical features being singled out for closer inspection. What can be done to eliminate the potential for discrimination in training data? How might systems be tested for discrimination?

  6. Is there a conceptual difference between manifest discrimination built into a system and human discrimination? Which of the two is more easily identified?

  7. People tend to readily trust software-generated solutions and relinquish personal responsibility. Does that make discrimination by technical systems all the more dangerous? What are the possibilities for raising awareness about these matters? Should consciousness-raising efforts be introduced to schools, and if so, what form should this take? Is that an integral component of digital competency for the future?

  8. Figures for false positive and false negative rates are often given in percentages. So, margins of error under one percent don’t sound that bad at first glance. People frequently find it difficult to imagine how many individuals would be affected in real life and what the consequences and impact may be. The figures are often placed side by side without establishing the relationship between false positives (in our case, the number of people who show up in mugshots) and false negatives (in our case, the rest of the passengers). The ratio is often starkly unbalanced. In the test run described here, with a total of 163,847 people, 1,500 (positives) were identified, so about one in every 1,000 (1:1,000). Is this comparison misleading? Should these kinds of figures even show up in product descriptions and marketing brochures? Is it ethical for the responsible parties at EmraceTheFuture GmbH to go to press with this? Are there other means of measuring margins of error? How can the error rate be represented so systems can be realistically assessed?

Published in Informatik Spektrum 42(5), 2019, S. 367-369, doi : 10.1007/s00287-019-01213-x

Translated from German by Lillian M. Banks

Scenario: The Self-Driving Car

Christina B. Class & Debora Weber-Wulff

For years, they’ve been preparing for this. But now, 1950s-era dreams of a self-driving vehicle are finally coming true. They christened their creation “Galene”—the self-driving car. It performed like a champ on the test track. Even in test drives on American roads—for which Galene had to be shipped to the US—everything was swell. There were fewer regulations in the US, where endless highway stretches and good visibility combined allowed ample room for experimentation.

Everything was more complicated in Germany, and obtaining the necessary permits for testing on public thoroughfares took longer. The press has been invited to tomorrow’s widely publicized “maiden voyage.” Jürgen, one of Galene’s proud “parents,” has gotten approval from his team leader to take his baby out for a spin on the planned course before the press gaggle gets underway, just to be sure everything runs smoothly. He’s a good engineer, so his planning has been meticulous. It’s a Sunday afternoon when these roads won’t have much traffic. And he’ll be seated at the wheel himself to intervene should anything go wrong. He’s confident he won’t call attention to himself or annoy other drivers or passersby.

He tells the voice computer where he wants to go, and Galene confirms the destination. Then, he calculates the route, taking into account current traffic reports, known construction sites, and the weather forecast. Everything is good to go: there is no construction along the route, no rain, no fog, and only a slight breeze. It’s a sunny autumn day—perfect for the first test drive!

Jürgen is enjoying his ride along this route, which he knows well. It is a great feeling to let someone else do the driving, even though it still seems strange not to put on the gas, hit the brakes, or take the steering wheel. Galene enters the expressway flawlessly, passes a classic car, takes the next exit, slows to a crawl, and stops at the light. She always keeps her distance from the vehicle ahead of her. The steering is so precise that it could be set to approach within less than an inch of the car she is trailing. But that would put other drivers needlessly on edge, so Galene’s been programmed to maintain a distance of about 15 inches.

Jürgen would love to use his cell phone to film how he catches a “green wave” before he hangs a left at the third light—he wasn’t quite sure whether Galene would accurately calculate all the signals involved to make that happen, but she did: perfect! If he would pull out his phone and start filming, he could hardly sustain the illusion that he was the one driving the car. As they enter a newer residential district, Galene reduces her speed to the 30km/h limit. There’s a school on the left, with bus stops for school buses on both sides of the street. They invested a lot of time preparing Galene to deal with this type of traffic scenario.

Luckily, fall recess happens to be in session. To their right, they pass a park with sprawling grassy areas. He hears kids screaming and looks to his right. Jürgen sees dogs romping, brightly colored balls bouncing in the grass, and kites flying even brighter than that in the air. When the wind blows them in his direction, Jürgen instinctively grabs the steering wheel, knowing as he does that children at play ignore traffic.

Especially when he first started “driving” the self-driving car, this happened a lot: he would get nervous, reach for the steering wheel, and switch to manual control so he could take over. But he never needed to, so he gradually learned to relax and leave the driving to Galene. Now, though, suddenly, it happened: some kid with a kite in hand darted out onto the street from between two parked cars and was hit by Galene. The kid falls to the ground unconscious.

Galene immediately hits the brakes because her sensors have detected the impact. At the same time, Jürgen pulls the emergency stop button. Galene comes to a halt, and the hazard lights are activated. Jürgen gets out and runs toward the child, whose mother soon appears and starts going off on Jürgen. Another young woman exits the car, driving behind Jürgen, and begins administering first aid. She says she’s a nurse.

A dog owner visiting the park has already placed an emergency call,, and the ambulance arrives promptly to transport the child and his mother to the nearest hospital with blue lights flashing. The police are also on the scene to file an accident report. Jürgen appears to be in a state of shock. The young woman who administered first aid immediately approached the police, even before they had the chance to question Jürgen. She tells them her name is Sabine, and she was driving behind the vehicle involved in the crash. She thinks he was driving too fast. She was driving well below the 30 km/h speed limit—with all the sounds of kids playing in the park, the dogs chasing after balls, and the kites flying, you had to expect something like this to happen!

The police ask Jürgen for his license and registration. He gives them his ID, driver’s license, and the test drive permit. The cops are taken aback and start asking questions about the car—they’re intrigued. Since this involves a road test, but the vehicle is technically not licensed for operation on public roads, they insist on having Galene towed, mainly because the data had to be analyzed more thoroughly. Jürgen is sure that Galene followed the rules of the road, but the accusation made by the witness, Sabine, still weighs heavily on him. Tomorrow afternoon’s “maiden voyage” and press conference are in jeopardy. It’s a PR disaster—especially now after this accident.

Questions:

  • The car had an official operating license for road tests. Was it okay to take it out for a test drive before the road test was completed?
  • Airplane pilots are repeatedly required to undergo training to guarantee they can respond quickly in an emergency and take over controls from the autopilot. Will this type of training also be needed for self-driving cars? Should Jürgen have been permitted to sit back and relax during the road test?
  • As soon as Jürgen saw children playing in the park, he instinctively grabbed the steering wheel. As a driver, should he be required to take control of the vehicle in a situation like this, where he could expect children to run into the street?
  • Galene was following the 30 km/h speed limit, but the witness complained that this was too fast when there were so many kids playing in the park. When calculating speed, to what extent can and should algorithms account for activities along the roadway?
  • Unforeseen events will always cause accidents, whether a child running out into the street, a wildlife crossing, or a tree branch down on the road. Disaster is often averted by a driver’s speedy response or instinctive hesitation. Should algorithms be programmed to emulate some instinct? To what extent can self-training systems be of use in this regard?
  • Sometimes, rear-end collisions result from a driver following the rules of the road “too closely”: for example, stopping at a yellow light on a busy highway or sticking to the posted speed limit in the blind bend on an expressway exit ramp. Self-driving vehicles are programmed to adhere strictly to the rules. Should they be programmed with a built-in “bending of rules” based on the behavior of cars driving in front and behind them?
  • It would be impossible to test for every imaginable scenario, so the software on a self-driving car may respond inappropriately. In that case, who is liable? The developer? The manufacturer? The driver who is seated at the wheel “just in case”? Or would we take these cases in stride in exchange for the greater security these cars provide in other circumstances? Where do we draw the line?
  • How and when should software updates be installed for self-driving cars? Only at the dealership or wherever the vehicle happens to be located, as long as it is stationary. Who oversees and determines whether an update has been installed and when? What happens if an accident could have been prevented if a software update had been installed? And who is liable?

Published in Informatik Spektrum 38(6), 2015, S. 575–577.

Translated from German by Lillian M. Banks

Fallbeispiel: Zwischen Wertschätzung und Wertschöpfung

Stefan Ullrich, Reinhard Messerschmidt, Anton Frank

Von der Idee über die Datensammlung bis hin zur Nutzung sind oftmals viele verschiedene Menschen an einem Datenprojekt beteiligt. Dieses Fallbeispiel zeigt auf, welche moralischen Schwierigkeiten sich dabei eröffnen können.

Matilda, Micha und Meryem sind schon seit Schulzeiten befreundet und freuen sich daher sehr, dass sie ihr Freiwilliges Ökologisches Jahr (FÖJ) zu dritt bei einer gemeinwohlorientierten Organisation namens „Code Grün“ leisten können. Die Organisation arbeitet schon lange im „Civic Tech“-Bereich und möchte für ein gefördertes Projekt Daten für Umweltschutz sammeln. „Das ist doch etwas für ‚M hoch drei‘ – für uns!“, meint Matilda, schließlich haben sich die drei schon immer für die Natur interessiert.

Ganz konkret sollen sie in kleinen und mittleren Städten Temperatur und Luftqualität messen. Als sie die zahlreichen Messinstrumente sehen, kommt Meryem eine Idee: „Wir könnten die Sensoren in meinen Rollstuhl einbauen, da können wir auch ein paar Powerbanks einpacken!“ – „Ja, und für uns noch jede Menge Snacks und einen kleinen Kühlschrank“, witzelt Micha. Auch Matilda grinst und schlägt vor, neben den Umweltmessungen auch den Zustand der Barrierefreiheit mit zu erfassen, also die Funktionsweise von Aufzügen und den Zustand der Gehwege und so weiter.

Die ersten Stationen sind sie noch total enthusiastisch unterwegs, es ist spannend, so viele Gegenden im Umland kennenzulernen. Die Messung erfolgt mithilfe eines Sensor-Kits, das Code Grün zusammen mit einer Agentur entwickelt hat. Während ihrer Messfahrten fällt ihnen auf, wie sehr soziale Fragen mit Umweltschutzfragen zusammenhängen. „Durch die Stadtbäume hier ist es knapp vier Grad kühler als auf dem Marktplatz“, bemerkt Matilda, als sie im Hochsommer ihr wohlverdientes Eis unter einem Sonnenschirm genießen. „Na ja, den Leuten im klimatisierten Auto ist das ja egal“, kommentiert Micha. „Die ganzen abgesenkten Bordsteine und barrierefreien Straßen hier“, Meryem zeigt auf die kleinen Straßen und Gassen der Innenstadt, „sind für mich total praktisch, auch die Frau dort mit dem Kinderwagen kommt besser durch. Aber alles in der prallen Sonne, kein Wunder, dass das Baby schreit. Warum sind ausgerechnet hier keine Stadtbäume?“

Am Ende des Jahres wird es aber doch sehr zur Routine, der Spaß des Anfangs ist verflogen und ihnen wird bewusst, warum es eine FÖJ-Aufgabe ist. Um sich wieder zu motivieren, beschließen die drei, kleine Web-Apps zu schreiben, die auf ihren Daten basieren. Sie stellen sich dilettantisch an, nur Meryem hat bereits etwas Programmiererfahrung, aber es ist toll zu sehen, was man aus ihren Daten alles lesen kann. Auch die ganzen Ausreißer sind sehr amüsant, meistens Messfehler oder falsch angeschlossene Sensoren. Ihr Vorgesetzter bei Code Grün findet das spannend und bringt sie mit den „Devs“ zusammen, wie das Entwicklungsteam hier genannt wird. Mit den bereits vorhandenen Daten macht es sogar noch viel mehr Spaß, sich neue Web-Apps auszudenken.

In ihrem letzten Monat sollen die drei eine Präsentation vorbereiten und über ihre Datensammelei anekdotisch erzählen. Die Hauptrednerin des geplanten Events ist allerdings eine Person der Partner-Werbeagentur, die eine neue App vorstellen will. Die App ist toll gemacht, Umweltdaten werden erlebbar gemacht, mithilfe einer „Digitalen Lupe“ können Pflanzen und Insekten bestimmt werden und die „Zukunftslinse“ erlaubt Vorhersagen zur Luftqualität der kommenden Stunden und Tage.

Bei einer Sache werden Matilda, Micha und Meryem allerdings stutzig: Die App liefert auch eine Route für Personen im Rollstuhl mithilfe der Daten, die die drei erfasst haben. Erwähnt wird das allerdings nicht, es wird als „Prototyp basierend auf Daten von Code Grün“ vorgestellt.

Die ganze Idee mit der Barrierefreiheit war doch auf ihrem Mist gewachsen und nun schmückt sich eine Agentur mit fremden Federn! „Na ja“, sagt ihr Vorgesetzter am nächsten Tag, „die Daten sind ja für alle da, da können wir nicht jeden einzelnen Datensammler namentlich erwähnen. Die App ist außerdem auch kostenlos!“ – „Ja, kostenlos, aber nicht Open Source. Und die anderen Apps der Agentur werden teuer an Kommunen verkauft!“, ärgert sich Meryem.

Sie weiß nicht, warum sie die App der Agentur so stört, sie tut doch genau das, was sie soll, und ist sehr praktisch. Dennoch fühlt sie sich irgendwie ausgenutzt, obwohl alle sehr korrekt und offen waren. Sie kommt zur Einsicht, dass „M hoch drei“ ohne die Agentur niemals eine funktionsfähige App auf Tausende von Smartphones gebracht hätte. Vielleicht kann sie ihre Namen wenigstens noch in die Credits bekommen in einem zukünftigen Update.

Als einige Monate später die drei ihr Unbehagen fast schon wieder vergessen haben, stolpert Matilda zufällig über einen Online-Artikel, in dem erwähnt wird, dass der massiv gewachsene Datenpool des inzwischen ausgegründeten Start-ups in eine neuere App mit breiterer Funktionalität münden soll, an der mehrere Bundesländer bereits Interesse signalisiert haben. Eine Erfolgsgeschichte, oder?

Fragen:

  • Es ist ein offenes Geheimnis, dass die Wertschöpfung der Daten selten bei den Entwickler*innen und eigentlich nie bei den Datensammler*innen stattfindet. Warum ist das ein ethisches Problem?
  • Es ist nicht klar, dass die Idee mit der Barrierefreiheit wirklich vom FÖJ-Team stammt, es gibt ja auch andere Datenprojekte. Doch angenommen, es trifft zu: Wäre es angemessen, alle Datensammelnden namentlich zu erwähnen? Sollten sie auch als Ideengeber*innen benannt werden?
  • Das aus der Agentur ausgegründete Start-up kann mit erheblichen Fördergeldern rechnen. Wie ist es in ethischer Hinsicht zu bewerten, dass der Datenpool von einem FÖJ-Team mit aufgebaut wurde?
  • Was ändert sich aus ethischer Sicht, wenn Daten als digitale Gemeingüter öffentlich geteilt werden, und wie sollte deren kommerzielle Nutzung geregelt (oder gar ausgeschlossen) werden?
  • Was für eine öffentlich-digitale Infrastruktur wäre nötig und was brauchen ihre Nutzer*innen, um solche Projekte gemeinwohlorientiert und ethisch reflektiert umsetzen, nachnutzen sowie längerfristig betreiben und weiterentwickeln zu können?
  • Umweltschutz und soziale Fragen hängen zusammen, aber müssen diese Themen dennoch unterschiedlich in ethischer Hinsicht diskutiert werden?

Erschienen in .inf 06. Das Informatik-Magazin, Sommer 2024, https://inf.gi.de/06/gewissensbits-zwischen-wertschaetzung-und-wertschoepfung.