NewsResearch

Augustana researcher uncovers bugs in Jupyter Notebook

Lutellier said this is the first step to improving the popular software program.

An assistant professor of computing science at the University of Alberta’s Augustana campus has successfully identified common bugs in Jupyter Notebook, opening the door to more secure and accurate data exploration.

Jupyter notebooks are complex, interactive online documents that combine source code, equations, and output, such as images and graphs, into one space, making them popular among data scientists.

Thibaud Lutellier, alongside students Harsh Darji, Wenyuan Jiang, and Diany Pressato, found that these notebooks are especially prone to bugs due to their unique nature. Having many moving parts means there are more opportunities for something to go wrong.

Furthermore, most of the detected bugs were correlated to users, Lutellier said.

“That could include how many people are working on the notebooks, how much experience they have, or how often they modify a file.”

As such, Jupyter notebooks in their current form are not suitable for collaborative work.

The identified bugs were primarily a result of improper environment setup and application programming interface (API) use, Lutellier said, meaning notebooks may function accurately at first, but will deteriorate with time.

“Think about, for example, an application that’s on an android phone. Every time you have a new android or iPhone operating system, you need to update your application so it keeps working on the latest phone.”

Lutellier to conduct further research on the software program

At the start of the project, the team set out to answer three research questions: which notebook characteristics correlate with bugs, what types of bugs are most common, and what security vulnerabilities are most prevalent. With these goals in mind, their methodology comprised four main stages.

The first step was mining nearly 9,000 open-source notebooks for notebook characteristics and code changes. Second, they correlated the identified bugs with specific code and project features. The third step was developing a taxonomy of the identified bugs, and the final step was investigating security issues within Jupyter Notebook.

Since notebooks are used by a wide array of individuals — not just data scientists — one challenge the team had to overcome was ensuring they only analyzed relevant data. For example, they had to root out notebooks that were used simply for a personal project from their sample.

This is the first time Jupyter Notebook specifically has been scrutinized for bugs, Lutellier said.

“In a sense, we did the easy part. We didn’t provide the solution yet.”

He is now developing an artificial intelligence (AI) tool to automatically and accurately detect bugs in Jupyter notebooks, which he said is outperforming other models such as ChatGPT.

Related Articles

Back to top button