Behind the Build: Jupyter Notebooks in Our Data Science Courses

9 months ago 178

ARTICLE AD

If you’ve taken one of our data science courses, you’ve probably seen Jupyter Notebooks, the industry-standard workspaces for developing data science code and visualizations. Jupyter Notebooks are built right into lessons in many of our courses and paths — like Getting Started with Python for Data Science and Introduction to Big Data with PySpark — so you can get hands-on realistic experience working with the tools that professional Data Scientists use in the wild.

Nick Duckwiler, a Software Engineer II at Codecademy, led the tech team that was tasked with integrating Jupyter Notebooks into the learning environment, (the interactive platform that you see when you’re taking our courses or paths). It was a unique challenge with layers of steps that took a year to complete from kickoff to release. “There was sort of a blueprint for how to do this, but a lot of it we had to figure out ourselves,” Nick says.

Here’s an inside look at how Codecademy engineers built Jupyter Notebooks into our courses and paths, the hurdles they faced in the process, and the lessons they learned from building this feature.

The project: Get Jupyter Notebooks in the Codecademy learning environment.

Normally, the Jupyter Notebook web app runs locally or on a cloud provider, “which is hard on its own,” Nick says. The team had to add Jupyter Notebooks as a component in the learning environment, hook it up to our other functionalities, and communicate with the Jupyter Notebook server, so that learners could write, save, and evaluate code all while taking a course.

There were three projects nestled inside this larger assignment:

Enable Jupyter notebooks to run on our back-end infrastructure Make sure the front end of the product looks presentable Add a functionality so learners can test their code and surface a solution

Investigation and roadmapping

“There was an initial step to just check that this was possible on the back end, because a Jupyter Notebook runs in our containers. It took three or four months of work to investigate that. Early on, it took me a while to understand really what Jupyter Notebooks is and what it means when you run Jupyter Notebooks.

During our sprints, we do these things called spikes, where one or two engineers will explore how hard something is, if something is possible, or how much work it’ll require if we do it. You just kind of poke around the code thinking, What do we need to get this going? Usually, that results in some proof of concept.

Then we make a project plan for all the work that needs to be done. Each project has a technical lead, an engineer who takes the user requirements or the designs and figures out how it gets done. We also usually have help from a Project Manager to figure out what should be prioritized first and who should be working on what. A lot of planning happens in Jira, where we will make epics or sagas, which are large collections of tickets. Then we break it down into smaller bits until there are specific tickets that an engineer can pick up and do within a few days.”

Implementation

“I had a huge Notion doc that I would write down all my findings in, and they would eventually result in a PR, which is some amount of code that I was going to change that would get reviewed by someone else and shipped. The front end of our courses and paths is built with TypeScript using React, Redux, and Next.js frameworks. And then the back-end services that are related to this project are written in Golang.

A peek into the Notion doc that Nick used to chronicle his findings.

A typical day involved a lot of coding, researching, and looking at the Jupyter docs. I would work on it for a while, and then if I got stuck, I’d ask another engineer who has more experience with different parts of the codebase for help. A lot of engineering is just solving headaches.”

Troubleshooting

“Something that was super frustrating on this project was proxying, which is basically forwarding requests between two services and authenticating them. It was a whole new infrastructure concept that I had never dealt with before.

When a learner logs on to Codecademy and is in a course or path, they’re connected to a container, which is like a mini virtual computer. The container has to go through a service that authenticates that user and assigns them a set of computing resources. And then those computing resources have their own sort of agent that manages those resources and makes sure that you don’t run anything bad. And if you are using it for more than an hour, it gives you a new container.

A lot of engineering is just solving headaches.

Nick Duckwiler

Codecademy Software Engineer II

There was just so much we needed to do to make sure all those requests are allowed, and that each service recognizes the source of the request, and responds in kind. Things randomly fail, or requests will come back with an unusual response that doesn’t make sense. It’s very difficult to debug, and it really was annoying until I understood it.

It helps to have a good team of people that are really supportive and want to help. That really carries me a lot of times when I’m so frustrated that I want to break my computer. The nice thing is that other engineers have all been through this type of thing, so they know how it is. The more little wins you get, the longer you can go without one, because you know there’s another one at the end.”

Ship

“The first time seeing Jupyter Notebooks fully run in our learning environment and having it look like what I wanted it to look like was super rewarding. I remember standing up with my hands in the air — it was so exciting. It took so much effort, and you start to doubt whether it will even work, because you just keep hitting another roadblock. Then you think, this is going to fix the roadblock, and it works for a while, and then someone else breaks it.

Here’s what it looks like to have a fully-functioning Jupyter Notebook embedded in our learning environment.

For my team, the main metrics we measure for success are delivery of the product and the developer experience. So if other people are using this code or using this new feature, how easy is it for them to work with it? There are also more technical metrics with more formal tracking, such as Largest Contentful Paint, latency between requests, or if there are dropped connections. We also listen for feedback from customer support. The Curriculum team was really happy, and they’re our main sort of ‘consumer’ because they’re using Jupyter Notebooks all the time and designing the experience.”

Retrospective

“I feel like every time I take on a new project that I don’t totally understand, I learn so much new stuff. With engineering, there are infinite rabbit holes that you can go down and down until you get to, like, how circuits work. This project forced me to learn what the heck is going on behind the learning environment.

This project was spread across multiple teams and phases. Mariel Frank, Software Engineer II, was super instrumental and did a ton of work on this, as well as Senior Software Engineer Tim Jenkins. Ian Munro, a Senior Software Engineer, helped out a lot. Leon Pham, a Staff Engineer on the Infrastructure team, helped me with the proxy in particular. And then we had two Product Managers, Dónal Ó Dubhthaigh, and Daniel Munter.”