Skip to content
Tech News
← Back to articles

A little comparison between R and Kap

read original get R Programming Book → more articles
Why This Matters

This comparison highlights the differences between R and Kap, emphasizing how R's defaults often streamline data loading and processing tasks, while Kap requires more explicit instructions. Understanding these distinctions is crucial for developers choosing the right tool for data analysis, impacting productivity and code readability in the tech industry.

Key Takeaways

Some time ago, I read this article: Why pandas feels clunky when coming from R. In it, the author explains why they feel that R is a much smoother tool than Pandas.

I'm not a familiar with Pandas, but I do know a bit of R, so when I recently implemented some new features in Kap, I decided that reimplementing the examples in the blog post in Kap may be a good way to demonstrate the differences between the languages.

Spoiler: the Kap solutions are shorter, but R has some nice defaults that has to be specified explicitly in Kap. At the end of the day, it all comes down to individual preference.

Loading the dataset

In R, the function read_csv is used to load CSV data. This function automatically parses things that look like numeric values as numbers, while the corresponding function in Kap returns strings. It also does not make an attempt to process the column headers.

purchases ← io:readCsv "purchases.csv" ┌→──────────────────────────────┐ ↓ "country" "amount" "discount"│ │ "USA" "2000" "10"│ │ "USA" "3500" "15"│ │ "USA" "3000" "20"│ │ "Canada" "120" "12"│ │ "Canada" "180" "18"│ │ "Canada" "3100" "21"│ ... └───────────────────────────────┘

So, the first thing we want to do is to remove the first row and use it as column labels. The simplest way to do this is to combine these using a fork:

purchases ← (>1↑)«labels»(1↓) purchases ┌───────────┬──────┬────────┐ │ country│amount│discount│ ├→──────────┴──────┴────────┤ ↓ "USA" "2000" "10"│ │ "USA" "3500" "15"│ │ "USA" "3000" "20"│ │ "Canada" "120" "12"│ │ "Canada" "180" "18"│ │ "Canada" "3100" "21"│ ... └───────────────────────────┘

All the above does is to take the first row ( 1↑ ) and turn that into a 1-dimensional array of strings (using < ), then drop the first row (using 1↓ ) and finally pass these two arrays to labels which constructs the final result.

We still have to convert the strings into numbers. The function to do that is ⍎ , but we don't want to call it on the first column. This is achieved by running the parsing with under applied on a drop of the first column:

... continue reading