Welcome to Pandas Utilities documentation!¶
High-level tools for common Pandas workflows
To get started, look here.
An overview of pd_utils
¶
This is a collection of various functions to extend pandas
. Here is a listing
of the current functions.
Quick Links¶
Find the source code on Github.
Merge¶
|
Left merges df2 to df using on, but grabbing the most recent observation (right_datevar will be the soonest earlier than left_datevar). |
|
Creates a pandas groupby object, applies the aggregation function in func_str, and merges back the aggregated data to the original dataframe. |
|
Returns a dataframe which is a copy of the old one with an additional column containing an index by groups. |
This function reduces the given series down to unique values, applies the function, then expands back up to the original shape of the data. |
Date-time Handling¶
Converts a date or Series of dates loaded from a SAS SAS7BDAT file to a pandas date type. |
|
Takes a dataframe with a datetime object and creates year and month variables |
|
|
Creates new observations in the dataset advancing the time by the int or list given. |
|
Takes a monthly dataframe and returns a daily (trade day or calendar day) dataframe. |
Used for constructing a range of dates with pandas date_range function. |
|
|
The US trading day calendar behind the function |
Fill Data¶
|
Fills missing values by group, with different handling for string variables versus numeric |
Fills missing values by group, with different handling for string variables versus numeric, then keeps one observation per group. |
|
|
Adds rows so that each group has all non group IDs, optionally filling values by a pandas fill method |
|
Takes a dataframe which does not contain all possible combinations of byvars as rows. |
Transform¶
|
Returns equal- and value-weighted averages of variables within groups |
|
Replaces a DataFrame's column of a state abbreviation or state name to the opposite |
|
Takes a "long" format DataFrame and converts to a "wide" format |
|
Finds observations above the pct percentile and replaces the with the pct percentile value. |
|
Used for getting variable changes over time within bygroups. |
|
Takes a dataframe and column name(s) and concatenates string versions of the columns with those names. |
Portfolios¶
|
Constructs portfolios based on percentile values of groupvar. |
|
Creates portfolios and calculates equal- and value-weighted averages of variables within portfolios. |
|
Takes a df with a column of numbered portfolios and creates a new portfolio which is long the top portfolio and short the bottom portfolio. |
Correlations¶
|
Calculates correlations on a DataFrame and displays only the lower triangular of the resulting correlation DataFrame. |
Cumulate¶
|
Cumulates a variable over time. |
Regressions¶
|
Runs a regression of df[yvar] on df[xvars] by values of groupvar. |
Querying¶
|
Selects rows of a pandas dataframe by evaluating a condition on a subset of the dataframe's columns. |
|
Convenience function for running a pandasql query. |
Loading Data¶
|
Loads sas sas7bdat file into a pandas DataFrame. |
Testing¶
|
Takes a dataframe and prints all of its data in such a format that it can be copy-pasted to create a new dataframe from the pandas.DataFrame() constructor. |
- pd_utils
- pd_utils package
- Subpackages
- Submodules
- pd_utils.corr module
- pd_utils.cum module
- pd_utils.datetime_utils module
- pd_utils.filldata module
- pd_utils.load module
- pd_utils.merge module
- pd_utils.plot module
- pd_utils.port module
- pd_utils.query module
- pd_utils.regby module
- pd_utils.testing module
- pd_utils.timer module
- pd_utils.transform module
- pd_utils.utils module
- pd_utils package