A little comparison between R and Kap

Some time ago, I read this article: Why pandas feels clunky when coming from R. In it, the author explains why they feel that R is a much smoother tool than Pandas.

I'm not a familiar with Pandas, but I do know a bit of R, so when I recently implemented some new features in Kap, I decided that reimplementing the examples in the blog post in Kap may be a good way to demonstrate the differences between the languages.

Spoiler: the Kap solutions are shorter, but R has some nice defaults that has to be specified explicitly in Kap. At the end of the day, it all comes down to individual preference.

Loading the dataset

In R, the function read_csv is used to load CSV data. This function automatically parses things that look like numeric values as numbers, while the corresponding function in Kap returns strings. It also does not make an attempt to process the column headers.

purchases ← io:readCsv "purchases.csv" ┌→──────────────────────────────┐ ↓ "country" "amount" "discount"│ │ "USA" "2000" "10"│ │ "USA" "3500" "15"│ │ "USA" "3000" "20"│ │ "Canada" "120" "12"│ │ "Canada" "180" "18"│ │ "Canada" "3100" "21"│ ... └───────────────────────────────┘

So, the first thing we want to do is to remove the first row and use it as column labels. The simplest way to do this is to combine these using a fork:

purchases ← (>1↑)«labels»(1↓) purchases ┌───────────┬──────┬────────┐ │ country│amount│discount│ ├→──────────┴──────┴────────┤ ↓ "USA" "2000" "10"│ │ "USA" "3500" "15"│ │ "USA" "3000" "20"│ │ "Canada" "120" "12"│ │ "Canada" "180" "18"│ │ "Canada" "3100" "21"│ ... └───────────────────────────┘

All the above does is to take the first row ( 1↑ ) and turn that into a 1-dimensional array of strings (using < ), then drop the first row (using 1↓ ) and finally pass these two arrays to labels which constructs the final result.

We still have to convert the strings into numbers. The function to do that is ⍎ , but we don't want to call it on the first column. This is achieved by running the parsing with under applied on a drop of the first column:

... continue reading