Article Image
Ethan Wu / File Photo

To create an inclusive community that successfully tackles issues like discrimination, we need open lines of communication between faculty and students. When that communication breaks down, a misunderstanding can turn ugly and prevent real progress. We saw an example of this recently when Professor Satyen Kale assigned his machine learning class a project: to train classifiers on the New York Police Department's stop-and-frisk dataset.

Stop-and-frisk was a controversial NYPD interrogation program that disproportionately targeted young black and Hispanic men and was ruled unconstitutional and discriminatory by a federal court in 2013. The assignment, satirically titled "Help design RoboCop!" was an exercise using records of searches conducted by the NYPD. The task was to create a program that could accurately predict whether an officer would make an arrest based on features including location, date, race, sex, and the build of the person who was searched.

While the assignment was couched in satire, it had a serious mission: to illustrate the difficulties in developing any kind of "predictive policing" tool. Unfortunately, this was not stated explicitly, leading ColorCode, a recently formed student group representing people of color in tech, to accuse Professor Kale of practicing a "racist, ahistorical, and irresponsible pedagogy." They raised the issue on social media and demanded that Professor Kale apologize and remove the assignment.

The misunderstanding could have been handled with a conversation or an email. Instead, a lack of dialogue between the student group and the professor deprived students of an important educational opportunity: to study the stop-and-frisk dataset using machine learning techniques.

This data records a history of discriminatory searches by the NYPD over a period of over a decade. By learning from it and publicizing those inferences, we can enrich public discussion on police racism and brutality with hard statistics. An open examination of this dataset has the potential to influence policy and reduce police abuse.

I feel that we owe it to the victims of the stop-and-frisk program to study this data seriously. It may even be possible to learn something that helps prevent systematic racial profiling in the future. In my opinion, a machine learning class at New York's Ivy League school is the ideal forum in which to analyze this dataset about a dark aspect of New York's public safety.

Machine learning techniques are now starting to be used—and abused—by banks, governments, and corporations. That means this kind of data analysis is becoming increasingly relevant. Researchers are trying to figure out how to design machine learning methods that don't accidentally discriminate. Others are trying to catch people who hide racist policies behind opaque algorithms. This assignment was a great way for students to see these issues firsthand. This is intersectional research of the best kind: Technology meets a pressing social issue.

To be fair, the text of the assignment did not make it clear that this was the point of the assignment. But Professor Kale has subsequently stated that the assignment was satirical and that he is strongly against stop-and-frisk. He writes in a statement on his website that the assignment's setting in a "dystopian future was meant to be an ironic indicator" that "the data only reflects the arrest decisions of past police officers, which are decidedly not what one would want to imitate."

Although Kale could have been more forthright about his intentions, it is unfortunate that his attempts to bring socially relevant research into the classroom were met with online acrimony.

One encouraging takeaway is that both Professor Kale and ColorCode had good intentions. It falls on both sides to keep lines of communication open. For example, ColorCode could have reached out to Professor Kale to clarify his intentions before going public. Such a conversation could have indicated to both parties that they shared the same essential goal. Perhaps then Professor Kale could have updated the assignment with the appropriate context instead of having to remove it altogether, and ColorCode's first statement could have been just as well-informed as their second statement, which was published after Professor Kale's response.

This good-faith approach still leaves room for outcry and protest against a genuine racist—we had just better make sure we are dealing with one first.

The author is a School of Engineering and Applied Science MS student studying computer science.

To respond to this op-ed, or to submit an op-ed, contact

stop-and-frisk nypd machine learning colorcode computer science