ARTICLE AD
The R programming language was designed for data analysts, statisticians, and developers who need to generate insights, reports, and graphics from datasets. You can use it to perform statistical and graphical techniques like linear and non-linear modeling, classification, time-series analysis, and clustering.
The R package knitr is a popular tool in the R ecosystem that makes it easier for developers to do their job. Data analysts often crunch data to come up with insights that can help make better company decisions. They also spend a lot of time creating reports to describe their findings and recording all of their information so they can share it with various team members.
Generating reports manually can get tedious, so many analysts create one-off R scripts to generate them or use knitr. Ahead, we’ll look at literate programming (a concept that knitr and similar tools use), what knitr is, and how it’s used.
What is literate programming?
Literate programming is a type of programming introduced by computer scientist Donald Knuth. Literate programs explain their logic in a natural language like English. These explanations go deeper than the comments we expect to see in most code bases. A literate programmer’s job is to write software that humans can understand — not just applications that machines perform.
Programs in literate programming are documents containing both text for humans to read and executable chunks of code. According to Knuth, this method of programming forces the developer to state the reasons for the code they are writing in a natural language. This can make bad coding decisions more obvious. The texts are useful documentation that allow developers who join the project later to hit the ground running.
Today, literate programming is very popular and millions of users utilize various literate programming tools like Jupyter Notebook and JS-DOC today. For instance, data scientists and data analysts use tools like knitr to document their experiments with data and generate reports.
What is knitr used for?
The knitr package is a general-purpose literate programming tool used with the R programming language. Knitr allows you to mix any kind of text with any kind of R code in the same file.
But while you can use any type of text, it’s best to use R Markdown files that allow you to easily mix R code with Markdown text. And when you install the RStudio IDE, it comes with both the R Markdown and knitr packages to make it easier to get started.
Step 1: Start with an R Markdown file
The R Markdown format is based on the standard Markdown format, but it supports embedded R code. Here is a standard Markdown file that can be run through Pandoc or another Markdown processor to turn the text into an HTML file, PDF file, or even a Word document:
Here is an R Markdown file with embedded R code:
--- output: html_document --- # This is a H1 heading for a report in R Markdown. ## This will become an H2. * These * Will * Be * List * Items Here is a description that will show up as a paragraph. Here is another paragraph that only needs a line break for separation.Below is some R code that will be executed and the result embedded.
{r, echo=FALSE} plot(my_data)The top section of this file between the two sets of three dashes is called front matter. Here, you can put metadata related to the document including the title, author, date, and more. In this file, we set the output format to be generated as HTML.
The part at the bottom between the two sets of three backticks holds a chunk of R code. You can add parameters to this chunk of code between brackets. In this set of brackets, we say the language of the code is R. Setting echo to FALSE will allow us to receive the results of the plot function without the default action that echoes out the source code.
If you run this last file through a standard markdown processor, it will generate a file in the format you choose. Instead of executing the R code, it will format it as a block of source code and be done. The magic happens when you use knitr.
Step 2: Build a document with knitr
Markdown is only one of the many formats you can use with knitr, but it’s great for beginners. More experienced developers can choose from Latex, reStructuredText, and other formats.
If you have an R Markdown file loaded in the RStudio IDE, all you have to do to generate a report is click the “Knit HTML” button. When you do this, the knitr package will process the file and generate a file in the format you specify, which in our example will be an HTML file. You can also generate PDF files with knitr, though it might require installing supporting software.
All the plain text markdown will be converted to HTML, and the R code block will be executed and replaced with both the source code in the block and the results from executing the code. But, if you add the echo=FALSE parameter as we did in the example above, it will only replace the code block with the results of executing it and not include the source code.
What is knitr used for?
Adding extended notes throughout code and reports can be tough. You could add long comments to your code, but that can get messy, and nobody wants to dig through source code. You could write a custom script to generate a report, but then you’d have to build all your formatting in.
Changing the way the report generates the data would be relatively easy, but you would have to know the ins and outs of the styles you need to generate for either HTML or PDF reports. One change in the text could result in multiple formatting changes. Fortunately, generating reports or including extended notes along with your code using knitr is more convenient.
With knitr, developers can use the simple markdown format to add text to reports and code documents, embed code directly into the report, and click a button or run a single command that generates a report. When the data changes, the executable R code will update that part of the report. When the text needs changing, we would type the changes into the file in plain text and rebuild it.
Code notebooks
Data analysis and data science projects often start with experiments regarding which data you should pull to get the answers you need, which machine learning models or algorithms you should use, and how to present this data for maximum impact.
The code notebook concept works the same as a field scientist’s physical notebook. By recording changes to their code while they make them, developers can create notebooks for every step of their process. So if they take a wrong turn somewhere, they can retrace their steps back to a better version of their code to start experimenting again.
Code notebooks also require developers to think about the code they’re creating, document it, and allow them to share their results with other developers. The knitr package is one of the many literate programming tools you can use as a code notebook to track your work. Here are some other similar tools:
Jupyter Notebook Apache Zeppelin Google CoLab Spark NotebookReport generation
Part of a data scientist or analyst’s job is to build the tools a business needs to capture insights about the business and market. Another part of their job is taking these insights and putting them in a form that’s easy for other people to understand. There are many methods developers use to generate reports.
Some developers create a one-off script for each report they need to generate. Then, when the report needs to be updated, they update the script. Depending on the programming language used and the libraries available in that language, this update process can get complicated. They may have to create a template for the report and a separate script to generate the data for it, then merge it with the template. They may embed the report generation functionality in their script and write extra code to format the report. This can take a few steps.
They could also use a specialized Business Intelligence or BI tool, but BI tools can have limited functionality, or may require a specific programming language to do use.
Data professionals can spend less time tweaking reports by combining both text and code in the same file using R Markdown and knitr to generate reports. If the code needs to be used to add new values, they can simply edit the code chunks in the document. If the supporting information needs to be updated, they can write that out in plain text. If the document styles need tweaking, then that can be done with CSS style sheets when the report is generated.
Reproducible research
In data science, you need to be able to verify your findings. Scientific results need to be documented so that other people can follow the same path and come to the same conclusion. This requires a detailed description of the process used to collect the resulting data. The result has to be computationally reproducible with a minimum amount of manual steps.
Using knitr to document your research data as you write the code helps ensure you provide adequate detail. With knitr and R Markdown, data scientists can document every step in the process used to get certain results. They can start with the source they acquired the data from, then continue with the steps used to process the data. Finally, these processes are used to analyze the data and report the answers found. By documenting every step in knitr, data scientists can be fully transparent with their process and quickly convince others of the validity of their results.
Learn more about R and knitr
Now you know how powerful knitr can be when you are working with data. You can combine documentation with executable code to create a record of your work or a report that you can regenerate whenever the data changes just by recompiling it. It sure beats having to update a custom report script.
To use knitr, you need to learn R, which is a great language to learn if you’re into data. You can use our free course Learn R to get started; it will introduce you to the principles of data science, data analytics, and data visualization while you get proficient at using R’s syntax. If you’re looking for something a little more advanced, we also have Analyze Data with R and Learn Statistics with R. Once you learn R and have RStudio installed, creating complex and detailed reports with knitr is just a button click away since knitr installs with RStudio.