Code style

Good coding style is like correct spacing: you can manage without, butitsuremakesthingseaisertoread. :-)

Following some general rules on code styles with R and Python makes it easier for you and collaborators to read, understand, and thus improve code.

R
Python

library(tidyverse)
library(tidyquant)
library(scales)

prices <- tq_get("AAPL",
  get = "stock.prices",
  from = "2000-01-01",
  to = "2022-12-31"
)

import yfinance as yf

prices = yf.download(
    tickers="AAPL", 
    start="2000-01-01", 
    end="2022-12-31", 
    progress=False
  )

Name Conventions

As a fundamental guideline, opt for longer, descriptive names that offer clarity rather than opting for concise names for the sake of quick typing. While brief names may save minimal time during initial code composition, especially with autocomplete assistance, they can prove time-consuming when revisiting older code and attempting to decipher cryptic abbreviations.

# Strive for:
prices_high <- prices |> filter(adjusted > 0.9)

# Avoid:
PRICESHIGH <- prices |> filter(adjusted > 0.9)

Spaces

Put spaces on either side of mathematical operators apart from ^ (i.e. +, -, ==, <, …), and around the assignment operator (<-).

R
Python

a <- 2
b <- 1
d <- 2
# Strive for
z <- (a + b)^2 / d

# Avoid
z<-(a+b)^2/d

a = 2
b = 1
d = 2

# Strive for
z = (a + b)**2 / d

# Avoid
z=(a+b)**2/d

Don’t put spaces inside or outside parentheses for regular function calls. Always put a space after a comma, just like in standard English. This also valid for python.

# Strive for}
x <- c(1, 2, 3, 4)

mean(x, na.rm = TRUE)

# Avoid
mean (x ,na.rm=TRUE)

Pipes

|> and . should always have a space before it and should typically be the last thing on a line. This makes it easier to add new steps, rearrange existing steps, modify elements within a step, and get a 10,000 ft view by skimming the verbs on the left-hand side.

R
Python

# Strive for 
prices |>  
  filter(!is.na(date), !is.na(adjusted)) |> 
  count(symbol)

# Avoid
prices|>filter(!is.na(date), !is.na(adjusted))|>count(symbol)

import pandas as pd

# Strive for
(prices
  .query('not date.isna() and not adjusted.isna()')
  .groupby('symbol')
  .count()
)

# Avoid
prices.query('not date.isna() and not adjusted.isna()').groupby('symbol').count()

If the function you’re chaining into has named arguments (e.g., mutate() or summarize() in R, or functions with named parameters in Python), it is beneficial to put each argument on a new line. This practice enhances readability, making it easier to add, rearrange, or modify elements within each step.

However, if the function being chained does not have named arguments (e.g., select() or filter() in R, or functions without named parameters in Python), keeping everything on one line is acceptable. If it doesn’t fit on a single line, consider putting each argument on its own line for better visibility.

R
Python

# Strive for
prices |>  
  group_by(volume) |> 
  summarize(
    mean_adjusted = mean(adjusted, na.rm = TRUE),
    n = n()
  )

# Avoid
prices |>
  group_by(
    volume
  ) |> 
  summarize(mean_adjusted = mean(adjusted, na.rm = TRUE), n = n())

# Strive for
<<<<<<< HEAD
(prices
  .groupby('volume')
  .agg(adjusted=('adjusted', 'mean'), n=('adjusted', 'size'))
  .reset_index()
)

# Avoid
prices.groupby('volume').agg(delay=('adjusted', 'mean'), n=('adjusted', 'size')).reset_index()
=======
prices.groupby('volume') \
      .agg(adjusted=('adjusted', 'mean'), n=('adjusted', 'size')) \
      .reset_index()

# Avoid
prices.groupby(
      'volume') \
     .agg(adjusted=('adjusted', 'mean'), n=('adjusted', 'size')) \
     .reset_index()
>>>>>>> 2c0520acb673c394563a4b07f3ce496591c48095

After the first step of the pipeline, indent each line by two spaces. RStudio will automatically put the spaces in for you after a line break following a > . If you’re putting each argument on its own line, indent by an extra two spaces. Make sure ) is on its own line, and un-indented to match the horizontal position of the function name.

R
Python

# Strive for 
prices |>  
  group_by(volume) |> 
  summarize(
   mean_adjusted = mean(adjusted, na.rm = TRUE),
    n = n()
  )

# Avoid
prices|>
  group_by(volume) |> 
  summarize(
            mean_adjusted = mean(adjusted, na.rm = TRUE), 
             n = n()
           )

# Avoid
prices|>
  group_by(volume) |> 
  summarize(
  mean_adjusted = mean(adjusted, na.rm = TRUE), 
  n = n()
  )

#Strive for

(prices
  .groupby('volume')
  .agg(mean_adjusted=('adjusted', 'mean'), n=('adjusted', 'size'))
  .reset_index()
)

# Avoid

(prices
  .groupby('volume')
  .agg(
   mean_adjusted=('adjusted', 'mean'), n=('adjusted', 'size'))
  .reset_index()
 )

ggplot2

The same basic rules that apply to the pipe also apply to ggplot2; just treat + the same way as |>

Python

import pandas as pd
from plotnine import ggplot, aes, geom_point, geom_line, labs, theme_minimal

filtered_prices = prices[prices['volume'] > 3000]

(ggplot(filtered_prices, aes(x='date', y='adjusted')) +
 geom_point() +
 geom_line(aes(group=1)) +  # Specify a grouping variable to avoid warning
 labs(x="Date", y="Adjusted Price", title="Adjusted Prices when Volume > 3000") +
 theme_minimal())

Styler package(for R)

First install the styler package, restart R studio. Go to “Addins”, and click on “Set style”. Afterwards, select the code you want to style,go again to “Addins”, click on “Style selection”. Try with the code below:

# Strive for
z <- (a + b)^2 / d

# Avoid
z <- (a + b)^2/d

Notice as the adjustment the styler package implements.

PEP8 package(for Python)

An equivalent tool in Python, known as PEP8, offers a different approach to code formatting. By executing the command autopep8 your_file_name.py --in-place --aggressive --verbose in the terminal, the code will be automatically adjusted to adhere to the PEP8 styling guidelines.

# Strive for
z = (a + b)**2 / d

# Avoid
z = (a + b)**2/d