library(tidyverse)
library(tidyquant)
library(scales)
<- tq_get("AAPL",
prices get = "stock.prices",
from = "2000-01-01",
to = "2022-12-31"
)
Code style
Good coding style is like correct spacing: you can manage without, butitsuremakesthingseaisertoread. :-)
Following some general rules on code styles with R and Python makes it easier for you and collaborators to read, understand, and thus improve code.
import yfinance as yf
= yf.download(
prices ="AAPL",
tickers="2000-01-01",
start="2022-12-31",
end=False
progress )
Name Conventions
As a fundamental guideline, opt for longer, descriptive names that offer clarity rather than opting for concise names for the sake of quick typing. While brief names may save minimal time during initial code composition, especially with autocomplete assistance, they can prove time-consuming when revisiting older code and attempting to decipher cryptic abbreviations.
# Strive for:
<- prices |> filter(adjusted > 0.9)
prices_high
# Avoid:
<- prices |> filter(adjusted > 0.9) PRICESHIGH
Spaces
Put spaces on either side of mathematical operators apart from ^ (i.e. +, -, ==, <, …), and around the assignment operator (<-).
<- 2
a <- 1
b <- 2
d # Strive for
<- (a + b)^2 / d
z
# Avoid
<-(a+b)^2/d z
= 2
a = 1
b = 2
d
# Strive for
= (a + b)**2 / d
z
# Avoid
=(a+b)**2/d z
Don’t put spaces inside or outside parentheses for regular function calls. Always put a space after a comma, just like in standard English. This also valid for python.
# Strive for}
<- c(1, 2, 3, 4)
x
mean(x, na.rm = TRUE)
# Avoid
mean (x ,na.rm=TRUE)
Pipes
|> and . should always have a space before it and should typically be the last thing on a line. This makes it easier to add new steps, rearrange existing steps, modify elements within a step, and get a 10,000 ft view by skimming the verbs on the left-hand side.
# Strive for
|>
prices filter(!is.na(date), !is.na(adjusted)) |>
count(symbol)
# Avoid
|>filter(!is.na(date), !is.na(adjusted))|>count(symbol) prices
import pandas as pd
# Strive for
(prices'not date.isna() and not adjusted.isna()')
.query('symbol')
.groupby(
.count()
)
# Avoid
'not date.isna() and not adjusted.isna()').groupby('symbol').count() prices.query(
If the function you’re chaining into has named arguments (e.g., mutate()
or summarize()
in R, or functions with named parameters in Python), it is beneficial to put each argument on a new line. This practice enhances readability, making it easier to add, rearrange, or modify elements within each step.
However, if the function being chained does not have named arguments (e.g., select()
or filter()
in R, or functions without named parameters in Python), keeping everything on one line is acceptable. If it doesn’t fit on a single line, consider putting each argument on its own line for better visibility.
# Strive for
|>
prices group_by(volume) |>
summarize(
mean_adjusted = mean(adjusted, na.rm = TRUE),
n = n()
)
# Avoid
|>
prices group_by(
volume|>
) summarize(mean_adjusted = mean(adjusted, na.rm = TRUE), n = n())
# Strive for
<<<<<<< HEAD
(prices'volume')
.groupby(=('adjusted', 'mean'), n=('adjusted', 'size'))
.agg(adjusted
.reset_index()
)
# Avoid
'volume').agg(delay=('adjusted', 'mean'), n=('adjusted', 'size')).reset_index()
prices.groupby(=======
'volume') \
prices.groupby(=('adjusted', 'mean'), n=('adjusted', 'size')) \
.agg(adjusted
.reset_index()
# Avoid
prices.groupby('volume') \
=('adjusted', 'mean'), n=('adjusted', 'size')) \
.agg(adjusted
.reset_index()>>>>>>> 2c0520acb673c394563a4b07f3ce496591c48095
After the first step of the pipeline, indent each line by two spaces. RStudio will automatically put the spaces in for you after a line break following a > . If you’re putting each argument on its own line, indent by an extra two spaces. Make sure ) is on its own line, and un-indented to match the horizontal position of the function name.
# Strive for
|>
prices group_by(volume) |>
summarize(
mean_adjusted = mean(adjusted, na.rm = TRUE),
n = n()
)
# Avoid
|>
pricesgroup_by(volume) |>
summarize(
mean_adjusted = mean(adjusted, na.rm = TRUE),
n = n()
)
# Avoid
|>
pricesgroup_by(volume) |>
summarize(
mean_adjusted = mean(adjusted, na.rm = TRUE),
n = n()
)
#Strive for
(prices'volume')
.groupby(=('adjusted', 'mean'), n=('adjusted', 'size'))
.agg(mean_adjusted
.reset_index()
)
# Avoid
(prices'volume')
.groupby(
.agg(=('adjusted', 'mean'), n=('adjusted', 'size'))
mean_adjusted
.reset_index() )
ggplot2
The same basic rules that apply to the pipe also apply to ggplot2; just treat + the same way as |>
import pandas as pd
from plotnine import ggplot, aes, geom_point, geom_line, labs, theme_minimal
= prices[prices['volume'] > 3000]
filtered_prices
='date', y='adjusted')) +
(ggplot(filtered_prices, aes(x+
geom_point() =1)) + # Specify a grouping variable to avoid warning
geom_line(aes(group="Date", y="Adjusted Price", title="Adjusted Prices when Volume > 3000") +
labs(x theme_minimal())
Styler package(for R)
First install the styler package, restart R studio. Go to “Addins”, and click on “Set style”. Afterwards, select the code you want to style,go again to “Addins”, click on “Style selection”. Try with the code below:
# Strive for
<- (a + b)^2 / d
z
# Avoid
<- (a + b)^2/d z
Notice as the adjustment the styler
package implements.
PEP8 package(for Python)
An equivalent tool in Python, known as PEP8, offers a different approach to code formatting. By executing the command autopep8 your_file_name.py --in-place --aggressive --verbose
in the terminal, the code will be automatically adjusted to adhere to the PEP8 styling guidelines.
# Strive for
= (a + b)**2 / d
z
# Avoid
= (a + b)**2/d z