Welcome to Pandas Utilities documentation!

High-level tools for common Pandas workflows

To get started, look here.

An overview of pd_utils

This is a collection of various functions to extend pandas. Here is a listing of the current functions.

Merge

left_merge_latest(df, df2, on)

Left merges df2 to df using on, but grabbing the most recent observation (right_datevar will be the soonest earlier than left_datevar).

groupby_merge(df, byvars, ...)

Creates a pandas groupby object, applies the aggregation function in func_str, and merges back the aggregated data to the original dataframe.

groupby_index(df, byvars[, ...])

Returns a dataframe which is a copy of the old one with an additional column containing an index by groups.

apply_func_to_unique_and_merge(...)

This function reduces the given series down to unique values, applies the function, then expands back up to the original shape of the data.

Date-time Handling

convert_sas_date_to_pandas_date(...)

Converts a date or Series of dates loaded from a SAS SAS7BDAT file to a pandas date type.

year_month_from_date(df)

Takes a dataframe with a datetime object and creates year and month variables

expand_time(df[, ...])

Creates new observations in the dataset advancing the time by the int or list given.

expand_months(df[, ...])

Takes a monthly dataframe and returns a daily (trade day or calendar day) dataframe.

tradedays()

Used for constructing a range of dates with pandas date_range function.

USTradingCalendar(...)

The US trading day calendar behind the function tradedays().

Fill Data

fillna_by_groups(df, byvars)

Fills missing values by group, with different handling for string variables versus numeric

fillna_by_groups_and_keep_one_per_group(df, ...)

Fills missing values by group, with different handling for string variables versus numeric, then keeps one observation per group.

add_missing_group_rows(df, ...)

Adds rows so that each group has all non group IDs, optionally filling values by a pandas fill method

fill_excluded_rows(df, byvars)

Takes a dataframe which does not contain all possible combinations of byvars as rows.

Transform

averages(df, avgvars, byvars)

Returns equal- and value-weighted averages of variables within groups

state_abbrev(df, col[, ...])

Replaces a DataFrame's column of a state abbreviation or state name to the opposite

long_to_wide(df, ...[, ...])

Takes a "long" format DataFrame and converts to a "wide" format

winsorize(df, pct[, ...])

Finds observations above the pct percentile and replaces the with the pct percentile value.

var_change_by_groups(df, ...)

Used for getting variable changes over time within bygroups.

join_col_strings(df, cols)

Takes a dataframe and column name(s) and concatenates string versions of the columns with those names.

Portfolios

portfolio(df, groupvar[, ...])

Constructs portfolios based on percentile values of groupvar.

portfolio_averages(df, ...[, ...])

Creates portfolios and calculates equal- and value-weighted averages of variables within portfolios.

long_short_portfolio(df, portvar)

Takes a df with a column of numbered portfolios and creates a new portfolio which is long the top portfolio and short the bottom portfolio.

Correlations

formatted_corr_df(df[, cols])

Calculates correlations on a DataFrame and displays only the lower triangular of the resulting correlation DataFrame.

Cumulate

cumulate(df, cumvars, method[, ...])

Cumulates a variable over time.

Regressions

reg_by(df, yvar, xvars, groupvar)

Runs a regression of df[yvar] on df[xvars] by values of groupvar.

Querying

select_rows_by_condition_on_columns(df, ...)

Selects rows of a pandas dataframe by evaluating a condition on a subset of the dataframe's columns.

sql(df_list, query)

Convenience function for running a pandasql query.

Loading Data

load_sas(filepath[, csv])

Loads sas sas7bdat file into a pandas DataFrame.

Testing

to_copy_paste(df[, index, ...])

Takes a dataframe and prints all of its data in such a format that it can be copy-pasted to create a new dataframe from the pandas.DataFrame() constructor.

Indices and tables