10/30/2019 | Alex Michael |
Polynote (HN), open-sourced by Netflix this week, is being called the Jupyter Notebook killer. This is a really high bar, because I think Jupyter Notebook is an awesome and creative piece of technology (respect to the developers) but maybe with the release of Polynote, the notebook ecosystem will get even better. The key features of Polynote that make it potentially so good are that:
Alot of the features that are really impressive about Polynote seem to come from actual use at Netflix so I am very confident that this software will be useful! You can read more about Polynote in this article from Towards Data Science.
Looks pretty sweet!
While the installation, isn’t super interesting, I hope that I can save someone a bit of the chagrin of installing this. Installation was bit of a pain, it wasn’t an easy one-line installation process like Jupyter, hopefully this will improve as the software matures. Also, it isn’t yet possible to use Polynote on Windows, only Mac and Linux. I am installing on Debian 10, so the steps here will focus on that process. I am more or less following the Polynote installation guide on the Polynote website, so this section may be redundant.
deb http://ftp.us.debian.org/debian sid main
to /etc/apt/sources.list
. This adds the packages which can be found in Debian Sid’s package repository to apt-get’s list of available packages. sudo apt-get update
sudo apt-get install openjdk-8-jdk
sudo update-alternatives --config java
export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
export PATH=$PATH:$JAVA_HOME/bin
wget https://download.java.net/openjdk/jdk8u40/ri/jdk_ri-8u40-b25-linux-x64-10_feb_2015.tar.gz
sudo mkdir /usr/lib/jvm
INSTALL_DIR=/usr/lib/jvm/
# move Java 8 jdk to default installation location
tar -xvzf jdk_ri-8u40-b25-linux-x64-10_feb_2015.tar.gz
rm jdk_ri-8u40-b25-linux-x64-10_feb_2015.tar.gz
sudo mv java-se-8u40-ri $INSTALL_DIR
# set environment variables (include these lines in ~/.bashrc)
export JAVA_HOME=/usr/lib/jvm/java-se-8u40-ri
export PATH=$JAVA_HOME/bin:$PATH
# Get Spark from the mirror (example)
wget http://apache.claz.org/spark/spark-2.4.4/spark-2.4.4-bin-hadoop2.7.tgz
# extract & install
tar -xvzf spark-2.4.4-bin-hadoop2.7.tgz
rm spark-2.4.4-bin-hadoop2.7.tgz
mv spark* spark
sudo mv spark /usr/local/
# add spark to environment (include these lines in ~/.bashrc)
export SPARK_HOME=/usr/local/spark/
export PYSPARK_PYTHON=python3
export PATH=$PATH:$SPARK_HOME/bin:$SPARK_HOME/sbin
Install recommended Python packages
pip3 install pip3 install jep jedi pyspark virtualenv
alias polynote="$HOME/polynote/polynote/"
to your ~/.bashrc
or ~/.bash_aliases
file.One of the major problems that I encountered with installing Java was with competing installations on my machine. The reason why Polynote uses Java 8 is that this is the recommended Java release by Scala. It seems that the errors resulting from using later versions of Java been resolved in Issue #530, but I didn’t try that out. Since I don’t really use Java or Scala that much, I just uninstalled the competing installation, however for some folks this might not be an option, so running Polynote in a docker image might be the best way to go.
After the minor headache of installing Polynote, trying to get it to run. Once I got Polynote up and runing, it kept throwing off errors so I had to spend a lot of time debugging, referring to Github issues. Each of the errors I encountered were the result of improper installation (i.e. user error), but they still took quite a bit of time to sort out. Once the big errors were worked out, a few of the ‘fibers’ were still logging errors, and I was still getting an error here and there from jep, but mostly the errors seemed to be harmless.
Once I was able to start the server, the next challenge was to start loading some data in to mess with. From all indications, Polynote doesn’t really like to deal with local files the same way that Jupyter does; the modus operandi seems to be to load in all data from Spark. This makes a lot of sense given that the folks over at Netflix probably aren’t doing much local storage. To get around this limitation, and since I don’t want to stand up a spark clustor or RDBMS based data store, I will just process the CSV files locally then pass the data in a PostgreSQL database, then pull it into memory in the notebook.
My first impression when working in Polynote is that it is a really good editing environment, it is a pleasure to edit in, it has some minor bugs in its user interface, but otherwise amazing! I only ran into some minor trouble with jep (something about interpreter change, just took a simple restart).
One of the things that Polynote can do that amazed me is that you can run code out of order. If you have modified the state of your notebook, you can still go back in and add code and it works! In Jupyter, you would have to run the whole notebook over again.
You can view your data with a single click!
Polynote's workflow is really nice
I didn’t take the workflow all the way to conclusion, since I am relatively inexperienced with data science that would take a while, but I will likely continue to explore this dataset the post the notebook and some results here later.
I love Polynote so far and the team has made lots of progress even in the last week! In order to see myself using this longterm, what I’d love to see are the following:
Getting Polynote up and running was a bit painful but once I finally was able to get it working and work through some example code, I was amazed with its quality and the thought that went into it. This software is an incredibly clean implementation of a brilliant concept.
Polynote has a lot of the features that I really wanted in Jupyter and in terms of data visualization, it acts a lot like a generalization of RStudio for large scale data-science. Polynote is somewhat opinionated and big-data oriented, which is to be expected from a large consumer-facing tech company like Netflix. Once a few of the rough edges are smoothed out, I think Polynote truly will be able to give Jupyter a run for its money with the consumer market. I hope to be able to contribute and I hope polynote will continue to get support from Netflix and the open-source community! Huge ups to the developers, you guys did an amazing job with this!