Welcome to Pandas Utilities documentation!¶

High-level tools for common Pandas workflows

To get started, look here.

Tutorial

An overview of `pd_utils`¶

This is a collection of various functions to extend pandas. Here is a listing of the current functions.

Quick Links¶

Find the source code on Github.

Merge¶

`left_merge_latest`(df, df2, on)	Left merges df2 to df using on, but grabbing the most recent observation (right_datevar will be the soonest earlier than left_datevar).
`groupby_merge`(df, byvars, ...)	Creates a pandas groupby object, applies the aggregation function in func_str, and merges back the aggregated data to the original dataframe.
`groupby_index`(df, byvars[, ...])	Returns a dataframe which is a copy of the old one with an additional column containing an index by groups.
`apply_func_to_unique_and_merge`(...)	This function reduces the given series down to unique values, applies the function, then expands back up to the original shape of the data.

Date-time Handling¶

`convert_sas_date_to_pandas_date`(...)	Converts a date or Series of dates loaded from a SAS SAS7BDAT file to a pandas date type.
`year_month_from_date`(df)	Takes a dataframe with a datetime object and creates year and month variables
`expand_time`(df[, ...])	Creates new observations in the dataset advancing the time by the int or list given.
`expand_months`(df[, ...])	Takes a monthly dataframe and returns a daily (trade day or calendar day) dataframe.
`tradedays`()	Used for constructing a range of dates with pandas date_range function.
`USTradingCalendar`(...)	The US trading day calendar behind the function `tradedays()`.

Fill Data¶

`fillna_by_groups`(df, byvars)	Fills missing values by group, with different handling for string variables versus numeric
`fillna_by_groups_and_keep_one_per_group`(df, ...)	Fills missing values by group, with different handling for string variables versus numeric, then keeps one observation per group.
`add_missing_group_rows`(df, ...)	Adds rows so that each group has all non group IDs, optionally filling values by a pandas fill method
`fill_excluded_rows`(df, byvars)	Takes a dataframe which does not contain all possible combinations of byvars as rows.

Transform¶

`averages`(df, avgvars, byvars)	Returns equal- and value-weighted averages of variables within groups
`state_abbrev`(df, col[, ...])	Replaces a DataFrame's column of a state abbreviation or state name to the opposite
`long_to_wide`(df, ...[, ...])	Takes a "long" format DataFrame and converts to a "wide" format
`winsorize`(df, pct[, ...])	Finds observations above the pct percentile and replaces the with the pct percentile value.
`var_change_by_groups`(df, ...)	Used for getting variable changes over time within bygroups.
`join_col_strings`(df, cols)	Takes a dataframe and column name(s) and concatenates string versions of the columns with those names.

Portfolios¶

`portfolio`(df, groupvar[, ...])	Constructs portfolios based on percentile values of groupvar.
`portfolio_averages`(df, ...[, ...])	Creates portfolios and calculates equal- and value-weighted averages of variables within portfolios.
`long_short_portfolio`(df, portvar)	Takes a df with a column of numbered portfolios and creates a new portfolio which is long the top portfolio and short the bottom portfolio.

Correlations¶

formatted_corr_df(df[, cols])

Calculates correlations on a DataFrame and displays only the lower triangular of the resulting correlation DataFrame.

Cumulate¶

cumulate(df, cumvars, method[, ...])

Cumulates a variable over time.

Regressions¶

reg_by(df, yvar, xvars, groupvar)

Runs a regression of df[yvar] on df[xvars] by values of groupvar.

Querying¶

`select_rows_by_condition_on_columns`(df, ...)	Selects rows of a pandas dataframe by evaluating a condition on a subset of the dataframe's columns.
`sql`(df_list, query)	Convenience function for running a pandasql query.

Loading Data¶

load_sas(filepath[, csv])

Loads sas sas7bdat file into a pandas DataFrame.

Testing¶

to_copy_paste(df[, index, ...])

Takes a dataframe and prints all of its data in such a format that it can be copy-pasted to create a new dataframe from the pandas.DataFrame() constructor.

API Documentation

pd_utils
- pd_utils package