Understanding Complex Results¶
Learn how to analyze and communicate complicated results using visualization.
Resources¶
Introduction to Visualization¶
Notes¶
Visualization is a key modeling concept as often we have many different outputs to understand, but humans are terrible at getting understanding by looking at lots of numbers
Thoughtfully creating appropriate visualizations will allow someone to glance at your model and gain immediate understanding at a much richer level
Tables are a more primitive form of visualization which lay out the numbers in a better format, while charts/graphs can summarize a lot of numbers in one picture
For the most part, visualization in Excel is straightforward: insert chart and follow the prompts. Your numbers should already be in tables.
Python, being open-source and developed by the community, has a dizzying array of options for visualization. There is far more than you can do in Excel, including interactive plots, but it is generally a bit more complicated to work with
In this course, we will focus on Pandas (powered by matplotlib) to produce graphs simply
Transcript¶
- hey everyone
- nick duraburtis here teaching you
- financial modeling and today
- is the first lecture in our next lecture
- series on understanding complex
- results digging into visualization
- so the lecture today is an introduction
- to visualization just generally talking
- about
- why we want to do visualization when
- it's useful
- and the overview of what it looks like
- in both excel and python
- so we when we think about visualization
- uh it's a way of getting understanding
- of more than one number at once
- and in our models so far we've just had
- one main output
- thinking about the salary model the
- dynamic salary retirement model that
- we've built out we've had the number of
- years to retirement
- as our main output from the model
- but we've also had salaries over time
- and wealth over time as
- outputs but each of those represents a
- lot of different numbers
- and so it's a little bit difficult to
- just uh only with the numbers
- present that in a way that someone can
- very easily
- understand what it looks like over time
- at a quick glance so that's where
- visualization becomes useful
- is any time where you have multiple
- different
- numbers that you want to show some kind
- of summarization
- of that information and you want it to
- be
- in a much more digestible format and
- these visualizations
- can be very powerful for getting very
- quick understanding
- of complex results
- so you know they say a picture is worth
- a thousand words
- and it definitely is true uh in the
- context of
- visual visualizing our results you know
- humans are just
- really bad in general at looking at a
- bunch of numbers
- and making some kind of interpretation
- out of it
- that's something that machines are very
- good at but
- humans are bad at and so humans
- we're visual creatures and so we have to
- display our data in a visual way that
- makes sense to
- a human so thinking about the
- the way that we have our results so far
- uh in excel
- we have something like this which shows
- our salaries and wealth over time in a
- tabular
- format so that is kind of the more basic
- form of visualization is just to lay the
- numbers out in a table
- you already get more context than just
- displaying the numbers at least they're
- laid out in a structured way and
- we can see you know the salaries and
- wells together they're aligned by the
- time
- so that already helps substantially in
- understanding what's going on here with
- the numbers
- we had this in excel basically because
- you're always kind of working in tables
- in excel
- but we didn't even get to a table format
- in python yet in python we've just been
- printing out sentences which say
- you know at year three you would have 63
- 000 as a salary and so we haven't had a
- good way of displaying this information
- in python yet
- so looking at this table i mean
- certainly
- you can look at it and get some
- conclusions from it
- uh you know you can see looking at the
- salaries okay it's increasing a little
- bit year by year
- and then when we hit year five we have
- this jump here
- that jump representing a raise from a
- promotion
- uh versus the cost of living races that
- come
- every other year so
- you can definitely see that but it takes
- some time
- looking at the numbers to understand
- that
- you can't just immediately glance at
- this and understand okay every five
- years
- the there's a promotion the salary is
- going to jump for the promotion
- you can see that if you just look at the
- numbers one by one and identify these
- patterns but
- it's not immediately obvious just
- looking at the numbers
- and that's where visualization can be
- really helpful
- and we'll see examples of what this
- looks like
- visualized so when we think
- about how to visualize things in excel
- i mentioned that we're already working
- in tables we probably already have our
- numbers
- laid out in this kind of format so
- that's already
- going down the tabular direction of
- visualizing
- data so we have that in place
- then what we can add on top of that is
- charts and graphs and so
- in excel that really all lives in one
- spot
- you just go and hit insert chart and
- you pop into this kind of menu here and
- you just look through the different
- possible charts
- for your data and select the one that is
- appropriate
- and that's pretty much the end-all
- be-all of excel visualization
- there's quite a bit of customization
- that you can do within the charts but
- everything kind of lives within this
- insert chart
- and then modifying the chart that it
- generates
- in python things are not so
- straightforward
- there are a lot of options for how to
- visualize things in python and that's
- because
- python is an open source language which
- is developed
- by the community millions of people
- across the world
- uh are out there building different
- solutions for how to visualize
- anyone can go and create a new way to
- visualize things
- in python and so many people have done
- that
- and so you have all these different
- packages and all these different
- ecosystems
- of how to visualize things in python
- that have different advantages and
- disadvantages
- and the vast majority of this is
- definitely outside the scope
- of this class you know we could spend an
- entire semester
- just looking at visualization in python
- but we're trying to teach financial
- modeling here so we're going to take
- what is easy and can get us quickly
- to some reasonable charts and graphs and
- just kind of recognize that these other
- options are out there
- and you can expand into using them but
- you don't have to do that you can get
- pretty good results with just what we're
- focusing on this class
- which is going to be using pandas and
- matplotlib for our visualizations
- but there are a lot of other cool things
- you can do a number of these
- out here um are solving some
- interesting problems and presenting data
- in new and interesting ways
- such as interactive plots where you can
- actually like zoom in
- to the plot and maybe hover over points
- and it will show you more information
- about the point
- and other kinds of interactive features
- uh they're very
- very cool and very interesting ways to
- think about visualizing data that have a
- lot of advantages
- but we're not going to dig into all that
- in this course
- i would recommend you to take a look at
- some of those things
- outside this course such as bokeh and
- hollow views
- are to the ones that
- i've used for interactive plots and
- they're very
- useful but we're just going to focus on
- penis and matplotlib in this class
- and why are we focusing there so
- matplotlib is it was kind of one of the
- first
- ways to visualize data in python and
- a lot has been built around matplotlib
- as kind of a basic system
- for producing charts and graphs
- and matplotlib is going to kind of be in
- the background
- for us really we're going to use pandas
- to
- directly generate the plots and graphs
- whereas pennis actually uses matplotlib
- under the hood we generally don't need
- to think about that very much
- most of the time we're just using pandas
- but i do mention
- here that it's all based on my
- matplotlib because
- matplotlib is extremely customizable and
- you can do all the same customizations
- on your panties plots
- as you can with matplotlib so basically
- pandas is going to be our really simple
- way
- of just quickly creating some kind of
- graph to show off our data
- and then if you need to do absolutely
- anything with customizing it
- then you can go to using matplotlib to
- make those adjustments
- so most of the time you're not going to
- need to really think about matplotlib
- but if you want to customize your graph
- in some way
- then you can just google about how to do
- that in
- matplotlib and you'll be able to apply
- that to your pandas plot
- so that's the overview of financial
- modeling visualization and how we're
- going to focus in this course
- we'll come back next time to go through
- an example of
- doing visualization in excel so thanks
- for listening
- and see you next time
Visualization in Excel Example¶
Notes¶
Recommended Charts is a nice way to scan through a few possibilities which probably work well for your data, but take a look at All Charts if nothing seems right
Make sure that you have an appropriate title and axis titles for your chart so the reader immediately knows what it is about.
Transcript¶
- hey everyone
- nick dear bertis here teaching you
- financial modeling and today we're
- going to go over an example of doing
- visualization
- in excel as part of our lecture series
- on
- visualization and understanding complex
- results
- so we're going to work with the dynamic
- salary retirement model
- that we've built out in the first part
- of this course
- and we're going to now add visualization
- to the excel side
- of that model so i'm going to jump over
- to
- the excel for the dynamic salary
- retirement model
- so we already have our main output
- coming here used to retirement but it
- would be nice to be able to see here
- also
- what the salaries look like over the
- time what do the wealth look like
- over time and being able to see that
- very quickly in a very digestible format
- of course we could just bring over
- the salaries and the wealth into a table
- and
- show that on the inputs and outputs tab
- as well but it's not going to be easy to
- just glance
- at these numbers and understand patterns
- in those numbers over time
- it would take a lot of really closely
- looking at the numbers to understand
- those patterns
- so instead let's produce some
- visualizations of those numbers
- so looking at the salary we want to make
- a plot of the salary over time
- and all of that is going to come from
- insert and then this charts section of
- the insert
- tab in the ribbon and the first thing
- that we want to do
- is we want to highlight the data that we
- want to plot
- so we want to plot the salaries we don't
- care
- about any of these intermediate
- calculations we just want the time
- and the salaries so one way to go about
- that
- is to just select those two columns
- before you go to create the chart
- so the easiest way to do that start over
- here
- and then on windows it's going to be
- holding shift and
- control on mac i believe it's shift and
- command
- and then you press down and it's going
- to highlight that entire column for you
- and then you can come over
- here to the salary cell hold control
- that would be command on mac
- and click and now you can see we've got
- this whole column as well
- as this cell and now that our last
- selection was on this cell we can do
- that same
- shift and control or command trick hit
- down again
- and now we have both of these two
- columns selected
- so from there i always like to go to
- recommended charts
- first because that will usually tell you
- the best charts
- for your particular data
- and you can look through the recommended
- charts for what you think
- is appropriate for your data
- but then if you don't see something that
- you like then you can go over to this
- all charts tab
- and explore through all the different
- options that you have
- but here the recommended chart did bring
- up
- a nice representation for this data
- which is a simple
- line chart so here we can see the
- salaries
- over time going up so let's go ahead and
- hit ok
- to insert that chart so now we can see
- we have this chart over here
- now i said that we wanted to have the
- chart over on the inputs and outputs tab
- but now i have it on the salary tab
- because that's where the salaries were
- so the reason that i made it over here
- is because it's very easy now to just
- click this
- and i'm going to hit control x that
- would be command x on mac
- you can see it disappears because i have
- now cut
- the plot and now i can paste it
- controller command v over here and
- everything stays
- linked together so you know now if i
- change
- how often the promotions are then you'll
- see that immediately changing in the
- graph as well
- everything is still linked together
- and this i think is generally easier
- than trying to
- start a chart on this page and then
- reference over
- to another page to get the data
- uh one other way that we could have
- selected the data
- to produce the plot is just to select
- everything
- and then uh insert the chart and then
- just let it be created and then after
- it's been created
- you can then see uh
- here where we can actually adjust the
- data which is included
- so now same way it also only includes
- the salary
- so now we have this chart of salary and
- we can see the salary over time this is
- already a lot more clear what's going on
- we can see those discrete jumps
- every time that there's a promotion
- one thing that's not so clear just
- looking at this is
- what what this axis means
- since the plot says this is salary it's
- fairly clear that this is a line of
- salary
- and these values represent the amount of
- the salary
- but here it's not clear what these
- numbers mean so
- you definitely want to add an axis title
- for this
- so this plus over here that comes up
- whenever you click on the graph
- is how you can easily add additional
- elements to the chart and here the
- element we want to add
- is going to be a primary horizontal axis
- title so we add that and then we can
- double click in here
- to change out the name of that and this
- is going to be
- uh time in years
- and that way it's a lot more clear what
- this actually
- represents so
- now this looks like a pretty good plot
- for salary
- but maybe we wanted to display this in a
- different way
- so it is possible to change the chart
- type after you've created it
- also uh it's possible to copy plots
- so that you can you know play around
- with different versions
- until you settle on something that you
- like so i'm going to go ahead and just
- copy ctrl
- c and then paste ctrl v this plot
- now we can see we have the same exact
- plot everything
- is still going to be linked together to
- the model
- and now we can click on this plot and we
- can go to the chart design tab
- and that allows us a few different
- things here
- one it's very easy to pick a different
- style
- for the plot just up here
- and the other is uh you can change the
- chart type here as well
- so you know instead of a line maybe we
- want a scatter plot
- with a with connected
- lines then it's more clear that we have
- observations for each year
- and not necessarily you know completely
- filled out
- in this axis um or maybe
- instead you wanted to represent this
- with
- maybe something like
- columns instead that can get at the same
- exact
- kind of concept so it's
- easy after you've already created the
- chart to go back and switch
- the type of chart as well
- so now that we've seen how to do that
- let's go ahead
- and bring over the chart for
- wealth as well so coming over to the
- wealth tab
- then we just do the same thing that we
- did
- previously we're just going to
- grab these two columns here the time
- column
- and the wealth column once again to
- insert
- recommended charts here and let's go
- with the line chart here
- and we can see also the wealth over time
- i'm going to add that title for the
- horizontal axis that this is
- time and years
- and then going to cut this so we can
- bring it over
- to the inputs and outputs tab
- so now we have both the salary and
- wealth and of course you can
- you know drag these to change the size
- so that everything will fit
- on the screen appropriately
- and now we can see everything will
- change together
- so it's a much easier way to get a quick
- overview of what's going on in the model
- and one other way we could have gone
- about this is
- right now we have separate graphs for
- the salary and the wealth
- we could have potentially combined them
- into
- the same plot because they have the same
- axes right they're both
- over time they're both talking about a
- dollar amount
- and the y-axis so it could make sense to
- have them on the same plot
- now the reason that i didn't do that is
- because
- the scale of these axes is quite
- different
- for for the x it's the same but for the
- y
- you can see the wealth axis here is
- actually 10 times as large
- as the salary axis so if we plot these
- two
- on the same plot then we're barely going
- to be able to see
- the salary line so we can just quickly
- see
- uh the example of that if i just grab
- the salary and the wealth uh here
- to put onto one chart that
- um we create that chart and it basically
- looks like the salary doesn't really do
- anything
- it increases but it looks almost like
- it's totally linear you can barely tell
- that the promotions are going on in
- there
- whereas the wealth just totally eclipses
- that by the end
- so um doesn't really make sense to put
- these two
- on the same plot when it's so much more
- clear what's going on
- when they're on separate plots because
- of the different scale
- of the axes so that's
- an overview of how to do visualization
- in excel
- next time we're going to come back and
- learn about
- pandas and python which is going to give
- us a table kind of data structure and
- we're going to start working towards
- visualization
- in python so thanks for listening and
- see you next time
Introduction to Pandas¶
Notes¶
We will be using pandas to produce tables and graphs in this course though the custom DataFrame type
You will also find these DataFrames useful for general problem-solving purposes. Many use them as a primary way to store and work with data in their models
Pandas does far more than we will cover in the course. It is the top Python package for manipulating and analyzing data. I use it extensively on a daily basis.
In this course, with Pandas we will focus on loading and exporting data, doing math, other basic operations and summarizations, and presenting data in a tabular format
Transcript¶
- hey everyone
- nick duraburtis here teaching you
- financial modeling and today
- we're going to do an introduction to
- pandas
- and this is part of our lecture series
- on understanding complex
- results using visualization
- so we want to get to be able to
- visualize our results in python
- but there's a little bit more that we've
- got to learn before we can get there
- so in python um
- we talked about how there's lots of
- different options
- for how we can visualize the data
- but we're going to focus on using pandas
- which uses matplotlib under the hood to
- do our
- visualizations so we've got to go and
- learn about how to use pandas
- in the first place because it does
- a lot more than just plotting
- and tables um
- but we gotta learn panda's basics
- and the basics really resolve around
- the pandas data frame so the data frame
- is this new type of object that we
- haven't worked with before
- you know we've worked with lists numbers
- strings
- all the basic kinds of data types
- uh but i've mentioned how anyone can
- create
- their own data type by writing a class
- and lots of third-party packages out
- there
- create these custom classes for you to
- use
- which give all sorts of functionality so
- that has been
- done in the panus library they've
- created the data frame
- class and so that defines this data
- frame type
- that we can use and when you think of a
- data frame
- think of basically a table so
- you know with lists numbers
- all the dictionaries all the data types
- we've thought about so far in python
- nothing really is like a table
- but we have these data frames that can
- fulfill that role for us
- so before we even get into the graphing
- side we've got to learn about how to
- work with data frames
- with these tabular representations of
- the data
- and there's quite a lot that we can do
- there
- and i will really become a basic tool
- in your toolkit for solving problems in
- python
- a lot of people use data frames
- as a very fundamental building block in
- their models as a way of storing and
- working with
- data so what we're learning here has a
- lot of applications beyond even just the
- visualization
- so what is this data frame thing that
- we're about to learn about
- again essentially you can just think of
- it as a table
- it has rows it has columns but there's a
- lot that we can
- do with them so some of the features
- that
- data frames have you can
- you know even after it's created you can
- add or remove rows and
- rows and columns you can aggregate
- that data in a lot of different ways
- with summary statistics and grouping by
- different things
- you can go to and from different
- data formats like excel files you can
- read in excel files export to excel
- files
- as well as working with databases
- and lots of other output formats
- you can take multiple different data
- sets and you can put them together by
- joining and merging and concatenating
- you can re re-sample and reshape
- your data thinking about you know
- different frequencies you have monthly
- data you want to take it to
- annual data or
- other ways of reformatting your data
- you can slice and dice and query from
- your data
- in any sorts of ways maybe
- you have data on countries and you just
- want to get the data for the us well you
- can just query for that
- us data and you can deal with
- different patterns in your data such as
- duplicates
- you can remove those duplicates you can
- remove missing values you can fill in
- missing values
- lots of different things you can do to
- manipulate the data
- which you have in your data frames
- so this is why i mentioned that this is
- really a fundamental
- building block for most people as they
- build out their models
- because there's so much that you can do
- with it and this is
- really the gold standard library for
- working with tabular data in python it's
- very very popular
- it's used all across finance and the
- data science
- industries so most people who
- work in python are at least familiar
- with pandas
- and a lot of them use it on a daily
- basis i definitely do
- so what does this data frame basically
- look like how do we create one how do we
- work with it
- so here's a very simple example
- of how you can create a data frame we'll
- talk about
- a few different ways that we're going to
- create data frames in this course
- but here's what i think is the easiest
- way to get started with one
- is of course this is a third party
- package
- that we're using so we do have to import
- that so
- we're going to import pandas and the
- convention that everybody uses
- if you google anything about pandas
- you're going to see people using the
- same convention
- is to import pandas as pd and then you
- always do pd dot
- whatever you want to use from pandas so
- to create this data frame it's going to
- be
- pd.data frame and we're going to assign
- that to a df
- variable so that we can use that going
- forward
- now i had everyone install python
- with anaconda in this class and so
- pandas
- is already installed within anaconda so
- you don't have to go
- and install that but if you
- did not use anaconda you have some other
- python distribution that does not have
- pandas included then you will have to go
- and
- install that package before you can use
- it but as long as you installed anaconda
- it should be there already for you
- so we have this data frame and right now
- it's uh an empty data frame we didn't
- give it any data to start with
- so that's where we are as of here an
- empty data frame no data in it
- so then what we can do is we can add
- some columns to this data frame
- so we're going to add a sales column
- with these values
- and we're going to add a category column
- with these values
- and then when we look at the data frame
- we'll see something like this
- a tabular representation of the data
- where we have
- sales and category as our columns and
- then we have the rows of this data
- that we passed in so that's a basic way
- of creating a data frame first you make
- an empty data frame
- then you assign the columns one by one
- and we'll look at two other ways that we
- generally make
- data frames uh being that you can create
- it all at once with the data
- all within a single command
- and the other main way is to load from
- some kind of external source like
- reading in an excel file to create your
- data frame
- so then let's jump over to the jupyter
- notebook example on how we can
- work with these data frames
- so
- we're over here on the intro to pandas
- and
- visualization notebook and
- so first the thing that we're gonna do
- as always is we're gonna import what we
- need so we're gonna import
- pandas as pd so that we can use that
- throughout and so the first way that
- we're talking about creating a data
- frame
- we create it first assign columns later
- so here
- you make the empty data frame and if you
- look at the empty data frame
- you basically see nothing because
- there's no data in there
- then we can go and we can assign a
- column to it
- so this is going to be assigning the
- name column
- and it's giving the values joe jim and
- mary
- so then when we run that then we see
- this table
- representation with the column name name
- and the values joe jim and mary
- and
- the reason that we use brackets here
- whereas you know before we've just
- looked at brackets being able to look up
- something in a dictionary
- or look up something in a list
- classes in python define the way you
- work with objects
- and the way that pandas has defined
- their data frame class
- is why you work with it in this way so
- it's all
- in the particulars of the implementation
- of the class
- so in order to get something useful in
- this data let's assign a few more
- columns
- uh so for each of these people now we
- have a weight
- uh we have a price that they're willing
- to pay
- for some good the reservation price and
- we have a percentage of the time that
- they
- spend doing activity like outdoor
- activity or something like that
- so now we can see all of that laid out
- in a table
- in a nice clean way
- so the other main way to create a data
- frame is give it all the data at once
- give it all the columns at once and
- it's definitely a little more
- complicated syntax to do that
- but it can be useful in some cases
- so here we're passing
- to the data frame a list of tuples
- so here's the outer list and then each
- item in the list
- is itself a tuple
- and each of those tuples has the data
- for a single row in the data frame
- so here's all of joe's values here's all
- of jim's values here's all of mary's
- values
- and then after this first argument of
- the list of tuples of the data
- then it's a comma and then we can say
- columns equals
- and we can give it a list of the names
- of the columns
- for the data frame so those columns come
- directly
- to be the names of the columns and so
- doing this we can create the same data
- frame that we made above
- by assigning the columns individually
- just all at once
- in a single command
- so now we have uh this data frame which
- has a few different people
- and some characteristics of those people
- how do we
- now just saying we have that data frame
- ready how do we pull out what we want
- from that data frame so
- similar to a dictionary we can put the
- name of the column
- as a string into the brackets and that
- will pull that column
- out of the dictionary so this
- reservation price
- each of these values have now come out
- when we access
- the reservation price
- and what we have here is actually a
- series
- a series is the other main
- class within pandas and it represents a
- single row or a single column
- of a data frame
- so that's how we get one column
- you can also select multiple columns at
- a time
- by passing it the list of the columns
- that you want so do notice that we have
- the double brackets here
- because the outer bracket means i want
- to look something up in the data frame
- and then the inner bracket means i want
- to look up this list
- of columns so if you omit
- that uh brackets the inner brackets
- then it's not going to work
- appropriately you're going to get a key
- error
- because it's trying to now look up a
- single column
- with the name of this whole thing which
- doesn't exist
- you do need to have that second set of
- brackets so that it's saying
- i want to look up a list of columns and
- give me each of those columns
- individually
- so that's selecting columns now how do
- we select
- rows so there's uh
- this i look we can do on the data frame
- that's integer location so that's
- just uh saying zero would be saying give
- me the first row
- once again zero based indexing as with
- nearly everything in python
- so zeros give me the first row so we can
- see
- again this is another series that we're
- getting
- because any individual row or column of
- data frame is a series
- and that's why it displays a little bit
- differently um
- but we can see we have all of joe's
- values
- joe had the 150 weight and that's what
- we're indeed
- getting here so and you know if we went
- to one then that would be
- getting jim's values and two gets merry
- values
- and if we try to go above that we would
- get an index error because there are not
- that many rows
- in the data frame and that's where you
- get this single position indexers out of
- bounds
- you know you should see index error out
- of bounds you should know that you're
- trying to go
- further than exists in the data frame
- we can also pull out something by both
- rows and columns
- so that we can do with loc so
- loc uh you can give it two arguments
- in contrast to the ilok and so with loc
- we tell it first what row we want to
- look up and then what column
- we want to look up so we're going to
- look up the first row
- and we're going to look up the
- reservation price column so here's the
- data frame again just for reference
- so we want to go to the first row here
- and we want to look up the reservation
- price column
- so that's why we get the single value of
- 10.12
- and then we can also query for whatever
- rows or columns of the data frame that
- we want
- so here we're going to get any rows
- which have a reservation price
- which is less than 14. so we can see the
- row
- the gym row which had a 15 reservation
- price is no longer here in this
- query result of the data frame but we do
- have the other rows because their
- reservation price was less than 14.
- um so you can kind of read this syntax
- as
- give me the data frame where
- uh the data frames reservation price
- is less than 14.
- so you do have to like repeat this data
- frame variable
- uh you can't just do something like
- that it's not going to understand that
- um
- it does have to be look up the data
- frame
- where the data frames reservation price
- is less than 14 and then it can work
- so we can do multiple queries
- at once as well you just have to
- separate them with an ampersand and put
- parentheses
- around each one of your queries so
- here we're doing what we did before and
- getting the reservation prices which are
- less than 14.
- but we're also going to get only those
- rows
- which have the percentage active greater
- than
- sixty percent so joe had sixty percent
- that's not greater than sixty percent
- and so that's why when we run this we're
- left only with
- the mary row that's the only one that
- satisfies
- all of the conditions that we've given
- to it
- but it's very important to include the
- parentheses if you omit any of those
- then it's not going to work
- appropriately
- you don't need them in this special case
- of just doing a single query
- they don't hurt but you don't need them
- whereas with multiple queries you
- definitely do need them
- um and then we can pass these same kind
- of queries
- into the loc command as well
- and for that um
- we do the same kind of syntax for the
- row part
- of the loc and then we can also pass it
- whatever columns that we want to get
- so we can take this same exact query
- that we had here to get this
- mary row and that comes as the first
- argument
- in loc and then the second argument is
- what columns we want to get
- so here we're getting just the name and
- the weight columns and that's why we see
- just mary's name and weight
- and you also could give it just a single
- column as well
- and that is going to just give you that
- value
- so that's selecting things out of data
- frames
- let's look at some basic math we can do
- so
- you can take an entire column at once
- from a data frame and do math with it
- which definitely simplifies things
- whereas before we would have always had
- to create loops
- to be able to apply these operations
- across all the different values of our
- data
- here it's one simple expression in
- pandas
- so we can just add 10
- to the reservation price column and we
- get a new column which has 10
- added to each of the values we can also
- do math with multiple different columns
- of the data frame
- we could multiply this price by the
- percentage active
- and get the result of that here
- and we can um
- any data frame which has just numbers we
- can
- do math with that as well so here
- taking the reservation price and weight
- columns out of the data frame
- we can multiply those both by 10 all at
- once
- um and
- you can also you know take the result of
- these things and assign them back
- into the data frame as as new columns
- if you would like
- we can also do some summary statistics
- very easily
- with data frames so
- dot describe kind of gives you the
- overview of all the different
- summary statistics you get how many
- there are of each you get the average
- standard deviation minimum different uh
- percentiles
- or you can do each these things
- individually so
- you know dot mean to get the averages
- dot std to get the standard deviations
- dot quantile to get here would be the
- 50th percentile
- which is the median and you can do
- whatever percentile that you want in
- this
- min to get the minimum max to get the
- maximum
- and any of these operations you can
- apply it across the row instead of the
- column
- if you would like um and so
- this is taking an average of all the
- numerical values
- in a given row and so this is saying
- this is the average of joe's values
- this is the average of james values and
- so on
- and so you just pass this axis equals
- one argument
- to one of these summary functions to be
- able to do that
- and axis equals one just means work over
- the rows instead of working over the
- columns
- and then you know these are all the kind
- of built-in things but panus also has a
- way that
- you can apply any function you want
- across
- a data frames values so here's another
- application where
- creating functions for all our different
- logical steps becomes very useful
- because then we can take this function
- we created and apply it
- to every single cell in the data frame
- um
- and so here this is a simple function
- that just takes the value
- and multiplies it by 100 and returns the
- result
- and it's this apply map that is able to
- take a function and apply it
- to each individual cell in the data
- frame
- and return a new data frame which has
- the result of all those calculations
- so we run that and we can see we get 100
- times each of the values that were there
- before
- and even for the strings we get uh those
- repeated 100 times
- so it was able to take this and apply it
- to all the different
- cells in the data frame now one thing
- that you'll notice
- is there's no open close parenthesis
- and passing some kind of number or
- arguments that you would expect to
- normally see
- with a function and that's because
- when we um
- do that we're calling the function right
- so we pass it a 5 and we get 500
- and now the result of this is just that
- 500
- this function has been called it's been
- evaluated it's gone now we're just left
- with 500.
- when we don't do the parentheses that's
- the function
- a reference to the function itself so
- that's why we see
- what is this thing it's a function it's
- the function multiplied by hundred
- and so with this structure we're passing
- the function itself
- into this other function apply map
- so that apply map can take this and
- apply it to each of the cells we have to
- pass the function
- itself and not the result of calling
- that function so this definitely is a
- concept that a lot of people struggle
- with
- in the beginning uh but we want to pass
- the function itself and so we don't put
- the parentheses
- whereas essentially other every other
- use case of functions
- you do want to call it and get the
- result of that
- and so you use the parentheses and pass
- whatever arguments
- so that's a quick overview on pandas
- and data frames and we're going to come
- back
- next time to look at some ways that we
- can style
- our data frames to make them look better
- so thanks for listening and see you next
- time
Styling Pandas DataFrames¶
Notes¶
Just as it is important to format tables in Excel to increase readability, we should do the same with any Pandas DataFrames we display to the reader of the model
There is a philosophical difference in how styling is done in Excel versus Pandas. In Excel, you directly format the table which stores your data. In Pandas, you create a styled object immediately before displaying which is separate from the original data, the data itself does not get formatted
Because of this difference in philosophy, the way I recommend working with Pandas styling is to create a styler function that accepts a DataFrame and returns the styled object. This way you can just call it on your DataFrame as you display it. This has a couple advantages: your data logic is completely separate from formatting code, and you can apply consistent formatting to multiple different DataFrames easily.
Transcript¶
- hey everyone
- nick dear versus here teaching you
- financial modeling today we're going to
- be talking about
- styling pandas data frames as part of
- our lecture series on
- understanding complex results with
- visualization
- so we learned last time about these
- pandas
- data frames which
- let you get a tabular representation of
- your data
- in python and
- we just had kind of the basic default
- way that these things looked
- and now we're going to learn how we can
- add our own custom
- styles and formatting to our data frames
- so where this is important is you
- definitely want to have
- number formatting when you display
- any tables to the reader of your model
- so that it's much easier to understand
- what those numbers mean
- and you may want to add additional
- styles
- especially something like conditional
- formatting where
- you highlight certain values based on
- some conditions
- to make it easier for the reader to
- understand the
- data in your table
- now most people probably already are
- very familiar with styling
- tables in excel you know you can change
- the number formatting you can change
- uh add borders you can change colors
- all these kind of things are just up
- there available in the ribbon
- with pandas uh of course you're using
- lines of code
- to do this formatting uh
- but it's a lot more flexible than the
- formatting
- in excel and the reason that
- it's more flexible is
- the way that it's built so
- when you see that table representation
- of the data the way that
- pandas the authors of pandas have made
- this happen
- is it actually produces
- html and css for
- this data to display this data frame
- and html and css for people that aren't
- familiar
- uh together make up the markup language
- which describes how any web page
- looks so anything that you've ever seen
- on any website
- you could potentially make your data
- frame look like that
- and basically anything at all because of
- how flexible
- html and css are
- and so you can go all the way to
- that complete control by directly
- manipulating the html and css
- of the representation of your data frame
- but that is definitely a little bit more
- advanced
- they've also added a lot of more
- convenient things that you can just
- easily do out of the box with simple
- commands
- that represent some of the most common
- things that you would want to do
- with your formatting so things like
- changing colors size positioning of text
- adding captions to your table
- conditional formatting
- and uh drawing a bar graph within the
- table
- these things are all uh kind of well
- explained and easily there out of the
- box
- but anything that you can possibly
- imagine
- you can do that with styling data frames
- so let's go ahead and take a look at the
- example
- notebook on what we can do so
- first let's talk about number formatting
- so
- we've already learned about number
- formatting with just plain numbers
- and python and we've been doing that
- using f strings
- so with the f strings uh we take
- the number and we put it in this
- quote quoted string with an f on the
- beginning
- and then inside that we do the curly
- braces
- and then within the curly braces we put
- our variable
- on the left and then a colon and then we
- put the format
- code that we want to use to describe the
- data
- and so we're going to take that same
- concept over to pandas
- as well now before we actually
- style our first data frame i do want to
- mention kind of a philosophical
- difference between the formatting
- in excel and the formatting of pandas
- data frames and python
- so in excel you have your data in a
- table and then you go and you format
- that table and now
- the original data is formatted
- but in python with pandas
- we have a separation of the data and the
- display
- of the data instead of directly styling
- the original data
- we actually create a separate object a
- style
- object which
- is the formatted representation of the
- data
- but that does not carry those changes
- back to
- the original data the original data is
- still unformatted
- so we only apply this formatting just as
- we're about to display
- this information to the reader of the
- model
- and that's the same way that things are
- working here with the f strings
- and formatting numbers in python
- you know we did this but my num
- is still the same exact number it
- doesn't have this
- formatting on it it was just as soon as
- we were ready to display that to the
- reader of the model
- then we added the formatting just in
- that spot it did not carry back
- to the original data and it's going to
- be the same exact concept
- with our pandas styling
- so now let's look at we were able to
- apply you know two decimal place
- formatting here if we have more decimals
- then it cuts it off at two because of
- this formatting
- let's look at applying that to the data
- frame as well
- so most of the things that we're going
- to do
- with styling data frames it's going to
- be dot style
- on the data frame and a lot of them are
- through this
- dot format then uh under the dot style
- so df.style.format and
- then for the number formatting we pass
- it to this format and we give it a
- dictionary and the dictionary should
- have
- the keys of the dictionary should be the
- names of the columns
- that we want to format so here we're
- formatting the reservation
- price column and the values
- should be this format
- you know very similar to what we did
- here a format string
- for how we want to format the data now
- the one difference that you'll see
- in what we had here versus what we have
- here
- is this one does not one does not
- include the f
- and two it does not include the variable
- itself
- so after your braces you then start with
- the colon
- as the first thing it knows that
- basically the variable is going to get
- inputted in here
- and then colon and then the same exact
- format code
- that you use for the f strings
- so when we run this then we see
- uh that it has put that dollar sign and
- the two decimal place
- formatting on there just to show the
- decimals if we put one
- then it would only be a single decimal
- place showing up there
- now we created this s variable
- as df.style.format the result of that
- and when we look at s it looks like a
- data frame
- right so you might include that you know
- now this
- s is our data we just have a formatted
- version of the data like we do in excel
- but actually if we check the type of
- this it's a
- styler object no longer a data frame
- and all of a sudden we can't do
- some of the things that we did before
- with it because it's not a data frame
- anymore
- it's a styler and so that's why i say
- you always just want to
- apply this formatting immediately before
- you're going to display it
- and not try to format your existing
- data frame because that's just not the
- way this works
- and we'll look at a pattern of what i
- think is a nice
- way to apply this formatting in your
- models
- but let's look through the other
- formatting methods first
- so we can apply that say percentage
- formatting
- to percentage active here uh zero
- decimal place
- percent formatting um
- and you can continue to chain additional
- formatting on a styler
- object so you can style stylers you can
- style data frames
- but you can only do math and other
- operations with
- data frames um
- so here we're adding additional
- formatting of we also want to format the
- percentage active volume
- column with the percentages so now we
- have both of these two columns formatted
- um but you could have also just directly
- done it from the data frame
- in the first place by giving it both of
- these number formats
- at the same time just passing both those
- in the dictionary
- so that's number formatting now let's
- look at cell formatting
- so cell formatting we can do with again
- pretty much everything is on dot style
- and then
- just as we looked at how we can do apply
- map to apply a function
- to every cell of the data frame
- we can similarly use dot style dot apply
- map
- to apply formatting to each cell
- of the data frame of the styler object
- um so
- we can see here what this is doing we
- can already see the result of this this
- made all the cells values blue
- and so what we did here was we created
- a function that returns
- the style that we want for each cell
- so this is the general structure of how
- to format cells in pandas you write
- functions which return the styles you
- want to apply
- and then you use apply map to apply
- those styles
- so here we're setting the color to blue
- for
- each one of these cells
- and this color blue this is actually
- uh css as i mentioned uh
- html and css is all what's going on in
- the background here
- so uh you can take a look at um
- you know here's a color picker where you
- can
- look at different colors and you can
- grab
- the value that you want for
- your formatting and put that in there
- whatever color that you want
- or there are just you know predefined
- names
- for a lot of common colors as well and
- you can just google about
- css colors to learn more about all this
- um so then color
- is about the text color and background
- color
- is about the background color of the
- cells
- so similarly here we've created a
- function which returns this background
- color
- um of light green and we're applying
- that to all the cells and so we see them
- all highlighted in light green
- text align will let you control the
- alignment
- of the text within the cells so here
- we're now centering
- each of these values
- and you can do multiple at once
- by just putting semicolons uh
- between each of these definitions so
- you should always be returning a single
- string but
- each style is going to be the name of
- what you want
- to affect and then a colon
- and then the style you want to apply
- and then for any additional ones you put
- a semicolon and then repeat the same
- kind of thing again
- so here text color to white uh
- background color to black
- alignment to center and we see all of
- that happening here at once
- so that leads us into thinking about
- conditional formatting where conditional
- formatting is
- we want to apply formatting to some
- subset
- of the cells whatever cells meet
- some certain condition that we want to
- apply
- so here let's think about
- you know this is a very short simple
- example we only have three people it's
- easy to just look through
- the values for three people but you
- could imagine you could have
- hundreds thousands of people in this
- table
- and what we want to try and do here is
- highlight the people who are not very
- active
- and so if their activity percentage is
- less than 50
- then we want to highlight that
- cell so the way that we can do that
- is just adding that conditional logic
- into our function
- which returns the style and knowing that
- if we return an empty string
- then it's not going to apply any style
- but if we return that string with the
- value
- then it is going to apply that style
- so here the function does two things
- so first um it checks
- if the value we're getting is a string
- so this is
- something we haven't seen before is
- instance is a way of checking the type
- of some object so here we're checking
- is the value that we're getting a string
- if so we're not going to format it
- and the reason we've added this let me
- go ahead and comment this out
- so that now this is all that is
- executing in the function
- we run that and we see we get an error
- right away
- type error because less than
- is not supported between a string and
- float and that's because uh
- we're checking if value is less than 0.5
- and we have jim
- joe and mary as values in there and so
- now it's trying to say
- uh you know like like joe
- is joe less than point five we see that
- same exact
- error so that's that's the problem
- that's going on here we don't want to
- compare the strings we never want to
- highlight the strings we just want to
- ignore them
- we do want to take the numeric values
- and do that comparison
- so that's all that this first part means
- is just basically skip
- the string values just return no style
- for the string values then we come to we
- know we've got a number now
- so now we're comparing is that number
- less than 0.5
- if so we're gonna make the background
- pink
- um otherwise the value is greater than
- 0.5 and so we're going to
- just return no style and so that's why
- we see
- after we apply this style to the data
- frame
- that only this one jim
- jim value is highlighted because jim is
- the only
- inactive individual in the data frame
- um and you know say you're trying to
- develop this function
- and things are not working out the way
- that you expected
- one shortcut or pattern you can use to
- be able to test out
- you know what what am i actually getting
- in each of these cells as far as the
- styling
- if instead of doing df.style.applymap
- you just do df.applymap remember we
- learned that's taking each cell and
- applying a function to it
- then you can see what styles are going
- to get applied in each cell
- so here no styles to each of these cells
- and here this one is getting
- background color pink so a nice way if
- your styles just are not coming out the
- way you expected
- see what am i actually getting in each
- of these cells
- and that's formatting individual cells
- we can also do formatting with the
- entire table
- at once so you can hide the index the
- index
- uh is this zero one two thing that goes
- down the side
- um so you don't wanna see that then you
- can just hide that
- you can hide individual columns if you
- want to exclude those just for the
- display purposes
- you can add a caption or title to the
- table
- um and there's a whole lot more that you
- can do
- this gets into uh how you can
- basically um do whatever you want
- with data frames and this example comes
- right out of the pandas
- documentation actually here is adding
- a hover style to the data frame so it
- does something different
- when you hover over it
- so we're not going to go into the
- details here this
- is really getting outside the scope of
- the course but i'm just showing this to
- kind of highlight you can do whatever
- you want
- with these uh styling of the data frames
- really
- the limit is only your imagination and
- willingness to
- figure out the specific code to make
- that happen
- one other nice uh thing that we can do
- very easily
- with these styling of data frames
- is it has this style.bar
- method which draws bar
- bar plots within the individual cells
- of the data frame and so if
- obviously we just have three rows it was
- already fairly easy to see but if you
- have a lot more this is
- getting a lot more useful uh from this
- we can easily see what are the smallest
- values in each of the columns what the
- largest values in each of the columns
- based off of those bars and you can
- change the color of those
- by passing a color argument and there
- are three different ways that we can
- align
- those bars as well
- so this is the
- default alignment of it
- which is the left alignment
- and so that's saying that the lowest
- value is not going to show up
- the highest value is going to be full
- and the rest are
- kind of spread in between
- or you can align from zero meaning that
- empty bar is representing 0
- and a full bar is
- again the max
- and we also have this mid alignment
- which puts zero
- at the middle negative values are going
- to be below that
- and positive values are going to be
- above that and we can also pass a subset
- to give it
- only certain columns that we want to
- apply this to instead of applying it to
- every numeric column
- so i mentioned i would show you a
- pattern on the best way
- to organize yourself with this styling
- because again you don't apply styling to
- the original data frame
- you're creating a new object a styler
- object which has the formatting which is
- separate
- from your original data so
- that df there does not have any of this
- styling that we just
- played around with in all these prior
- sections the original data is still as
- it was
- um and so the way that we can work with
- this in a nice way
- is by creating styler functions
- so you create a function which takes the
- data frame
- as the input and returns the styler
- object and that way all your
- formatting code is totally separate from
- all your data logic
- and you can potentially style multiple
- different data frames
- in the same way by just passing them to
- the same
- styler function this also shows you how
- you can
- chain a bunch of different uh formatting
- commands
- in one go uh if you use the parentheses
- that you can break this onto multiple
- lines then it looks pretty nice
- as well that first we're going to format
- the numbers
- with for the two columns and the ways
- that we talked about and then i want to
- highlight the inactive ones
- and i want to center the values and i
- want to hide the index
- and i want to give it a title of
- personal info and i want to draw the
- bars
- for the reservation price column so
- defining that function
- then we can call that on any data frame
- that we want so now we see the result
- of all those changes all together
- caption
- the centering the bar the highlighting
- the inactive
- the number formatting and now it all
- happens from
- one single function and because we have
- that as a function we can apply it to
- any data frame
- that we want um and so you know here
- we're
- we're making a new data frame which is
- just pulling out the reservation price
- and percentage active columns and
- multiplying them by 0.8
- we can take that same one and we can
- pass it to the same
- stylor function and we see that style
- now in the same way
- then there are also a few other
- shortcuts that pandas has
- added that are nice to know about you
- can
- easily highlight the maximum value in a
- column
- or the minimum value and you can
- pass similar arguments like we did for
- the bars color
- a subset
- same thing on the minimum you can pass
- those same kind of values
- background gradient is a really nice one
- for
- just giving a color map
- to the cells highlighting them based on
- their values
- and you can pass what specific type
- of color map that you want i mean if you
- want to figure out what
- are the possible
- values that i can use in that you can
- google matplotlib color
- maps again remember that pandas uses
- matplotlib under the hood for
- all these things and you can see
- this lists out all the pop all the
- possible
- color maps that you can use but the one
- that we'll
- focus on here and of course most often
- is
- this uh red yellow green color map
- um and you can do underscore r to
- reverse any color map
- so uh red yellow green
- is going to be the highest values in
- green the lowest values in red and a
- gradient in between
- and with the underscore r then it's the
- opposite highest values in red lowest
- values in green
- and a gradient in between
- and we'll see this definitely is a
- really nice way to be able to get
- a nice quick way to see where the
- largest and smallest values in a table
- are so that's an overview
- on pandas styling we're going to come
- back next time
- to learn about graphing with pandas
- and python so thanks for listening and
- see you next time
Introduction to Graphs in Python with Pandas¶
Notes¶
All the main graph types that you would expect are available in Pandas
See the official Pandas visualization guide on how to adjust any plots to your liking, but the defaults are already pretty good
Transcript¶
- hey everyone
- nick duraburtis here teaching you
- financial modeling today we're going to
- be talking about
- creating graphs in python using pandas
- this is part of our lecture series on
- understanding complex
- results through visualization
- so we learned in the last couple videos
- how we can use pandas in python as
- a way of putting our data into a tabular
- format and
- the other main usage that we're going to
- have for pandas is then to produce
- graphs from those data so
- now that we already have a data frame to
- work with
- it's quite easy to produce these plots
- so here's a quick example
- of doing a plot in pandas
- and really it comes down to one line of
- code if we already have our data frame
- variable
- then you do dot plot and then dot
- whatever type of plot that you want to
- do and you tell it what are the x and y
- axes uh giving it the column names
- of the data in your data frame
- and that's that's it then it will
- produce the plot for you
- um so quite straightforward the other
- part that you see here is this
- matplotlib
- inline thing don't worry too much about
- that
- that's just something for jupiter
- notebooks
- that allows it to display these plots
- appropriately in all cases
- so you just add this thing at the top of
- your notebook you can put it with your
- imports
- just do it one time just have it there
- at the top and then
- everything going forward you just do uh
- dot plot
- dot whatever to get your actual plot
- and for any of the main chart types that
- you expect
- you have that in pandas um
- so for all of the basic
- plot types um you know we pretty much
- have
- a python version which is analogous to
- the excel version
- um so here you know seeing a
- line plot and uh created from
- pandas versus align plot in excel
- we have these column or bar charts
- uh you can do box and whisker or
- whatever really else
- kind of all the basic general plot types
- so let's go look at an example of this
- so coming over to the intro to graphics
- jupiter notebook
- so first thing we're going to do is
- import pandas
- and we're going to run this percent
- matplotlib and line thing
- again don't need to worry much about
- what that is just do it once at the top
- of your notebook
- before we can plot something with pandas
- we have to have a data frame which has
- data in it
- so we'll go ahead and just create a data
- frame here so
- going with the style of creating an
- empty data frame and then assigning
- columns
- so here assigning uh some values
- this could be like a stock price over
- time
- uh and then get those time values
- um so now we've got a data frame here
- and here i'm actually doing what's
- called a list comprehension to
- produce those values that
- is basically creating this list we will
- cover this more later in the course you
- don't have to worry about it for now
- just know that all we're doing is
- assigning lists to create these columns
- so now that we have this data in this
- data frame
- we can plot it so you can just call
- justplot by itself without telling it
- the type of plot
- and it's going to try to do the best job
- it can
- in producing a plot so this is kind of
- analogous to
- the recommended charts in excel it's
- going to you know try to give you a
- recommended plot
- here in python um
- but we can see this is probably not what
- we wanted
- uh we want to see these values over time
- and right now it's plotting both the
- values and time
- um and it's going against this index
- this
- zero to 10 thing here which is why the
- first value we see
- is zero um so what we can do is we can
- tell it
- the y and x values and it can do
- a better job now knowing more about what
- we want
- so we know we want to have time on the
- x-axis and we want to have the values on
- the y-axis
- so we just pass that to the plot command
- values on the y and we want
- the t on the x the names the same names
- of the columns in this data frame
- and now this looks more like what we
- might have expected
- we have just the values plotted it's
- over time
- we already got an axis label for this
- automatically as well
- and we can see this axis represents
- uh values and now the axis has been fit
- well to the range
- of the values as well rather than
- you know having to try and fit both of
- these completely different
- uh scales on the same graph
- um and so that we just did with plot we
- said
- try and guess add the plot that we want
- um but you can also explicitly tell it
- what type of plot that you want
- and to figure out what type of plots
- are available you can
- after df.plot then do another dot and
- hit
- tab and you'll see the different
- possibilities come up
- so we can do all these different kinds
- of plots
- so these represent most of what you
- would probably expect to see
- as far as plot options
- and here's just them printed out in the
- notebook you don't have to worry about
- the code that does that that was just so
- i could get them printed here
- so let's look at a few examples so
- we can do a dot plot dot area again
- passing the same line x
- and that's the same kind of plot but it
- puts the full
- you know fills in the area under the
- line
- we got bar graphs dot bar
- uh again looks like we would expect
- shows the values over time
- and you have bar h to do a horizontal
- bar
- plot so same same thing as the paragraph
- just now
- rotated we can do a box and whisker plot
- with dot box
- the box box and whisker plots and the
- density and histograms those are all
- good
- for giving summaries of the distribution
- of the data
- so now we can see
- uh you know the full range of the data
- the interquartile range
- plotted on here so that gives a nice
- summary
- we can do you know both density and
- histogram are kind of getting out the
- same concept
- we want to see the distribution of
- the data across all the different values
- how frequently are we getting different
- values
- so the density is just a smooth
- version of the histogram so the
- histogram
- just puts things into buckets says you
- know this
- uh bucket is around 92.5 and then
- from like 93 to 95 is this bucket
- uh how many values fall within that
- range well three are within like the 93
- to 95 range
- and so you can see how many values occur
- in each of these
- ranges so the density plot is kind of
- the same thing
- we can see is something similar here
- only it's like smoothed out
- across uh all the different values
- rather than splitting it into buckets
- and pie chart doesn't really make sense
- for these data
- but you can certainly do that um
- and if you had data where it was more
- appropriate
- and then it's a good option uh such as
- frequencies or percentages for different
- things
- and we see all the values laid out in a
- pie chart here
- scatter plots the individual points
- and so this
- kind of wraps up all the different basic
- plot types that you would probably be
- using
- in this class and now of course there's
- way more that you can do
- by going to the base matplotlib that
- powers
- pandas plots but that's definitely
- getting outside the scope of this course
- we just want to be able to quickly get
- a good representation of our data
- and pandas gets us all the way there
- and if you want to customize your plots
- in any way
- just google about map matplotlib and
- matplotlib customization there's a whole
- lot there
- and you can then take these panties
- plots
- and add on that styling afterwards and
- customization afterwards
- so that's an overview on doing uh
- graphing with pandas and python
- next time we're going to come back and
- apply this
- uh this graphing as well as table
- visualization
- to the dynamic salary retirement model
- in python
- so thanks for listening and see you next
- time
Visualization in Python Example¶
Notes¶
If you have structured your model well, it should be easy to add visualization at the various stages of your model
Visualizations are especially helpful in Python as you don’t automatically see the tabular representation of the data like you do in Excel
Transcript¶
- hey everyone
- nick dear vertis here teaching you
- financial modeling and today we're going
- to be going over an example of how to do
- visualization in python as part of our
- lecture series on
- understanding complex results through
- visualization
- so we're going to add visualization
- to the dynamic salary retirement model
- which we've already built out in the
- course
- so i'm going to pull that up here
- and this is the same one which is
- available as a completed example
- on the course website so
- um we already have the retirement model
- and i'm just going to restart the kernel
- and
- run all cells so that we have everything
- defined
- and so we already have where we got to
- ultimately the result of
- it's going to take 28 years to retire
- and we just printed out strings
- of the wealth over time
- so now we want to do two things
- to visualize our results in a better way
- one is it would be good to have a table
- of
- these salaries and wells over time so
- that you can see both the salary and
- wealth together
- and each of those over time in a nice
- format
- and then the other uh thing we want to
- do to visualize
- is graph the salary and graph the wealth
- over time so
- we can create a new section of the model
- let's call this results summary
- and you would want to put some
- description of that
- here you can take a look at the
- completed example
- on the course website for having this
- thing fully polished with all the
- descriptions doc strings for functions
- etc
- we're just going to kind of build it out
- quickly here for sake of time
- so first thing we want to do
- is we want to create a data frame a
- pandas data frame
- which has our results so coming back to
- the top
- uh you'll notice that we don't yet have
- an import for pandas so we're going to
- import pandas
- as pd and then run this cell so that we
- can actually work
- with pandas the other thing that you'll
- notice because this is
- kind of the polished completed model is
- we don't have
- the uh setting model data to data so i'm
- going to add that
- back here so that we can develop our
- functions
- easily outside the function in the cell
- and then wrap it up into a function
- which accepts data
- and we're not going to accidentally use
- this global
- model data so
- coming back down to the bottom now i
- want to write out the logic which is
- going to
- put the uh salaries and wells into a
- data frame
- so to get there i'm going to start from
- the logic that we had before
- so i'm going to copy and paste that down
- to here
- highlight that all and hold shift and
- press tab to
- undo that and then and then
- you know if we take out the return part
- uh
- we run this it does the same thing as
- that function right
- but we don't care about the print
- display so i'm going to remove all the
- prints here
- i'm just hitting ctrl x that would be
- command x on a map
- mac to just remove a line
- and so now uh we
- have kind of the base logic without it
- displaying anything
- so what we want to do is we want to
- store
- the sell the salary and wealth results
- in each year so that we can put them all
- into a data frame
- and we don't actually have the salary
- separately in here within the wealthy
- year function it
- is determining a salary um so we're
- going to want to also
- separately calculate the salary so
- that was the salary year function that
- got us there
- and to that we passed the data and also
- the year
- so now we have uh calculating both the
- salary and the wealth
- in any given year
- so now we've got to store that data so
- we can create a data frame
- and any time where you have you know
- some kind of loop
- and you're ultimately going to want to
- put the results of that loop into a
- table
- and you have multiple different columns
- that you're going to want to have in
- that table that are all coming out of
- this same
- loop then i recommend using the
- structure of
- creating a list of tuples where
- you can then create the data frame all
- at once from that list of tuples
- you can certainly do the approach of
- creating the empty data frame and
- assigning columns
- but then you have to maintain separate
- lists for each one of the inputs which
- gets a little bit
- tedious um so first
- let's look at the recommended approach
- and then i'll quickly show
- the other approach just to show
- why it has drawbacks so
- my recommended approach is to create
- your
- list which is going to store all the
- tuples
- so you create that before the loop and
- then at the end of the loop
- we can append to that list and we're
- going to append
- a tuple which has the year the salary
- and the wealth all in it all three items
- together
- um and so then we can look at that
- df data tubes and we can see that gets
- us you know for each year we have the
- salary
- and the wealth so then we can
- create a data frame from that
- df is a pd.data frame from
- the list of tuples and we pass it the
- names of our columns
- so that would be year salary and
- wealth
- and then we can look at the data frame
- and then we see all this
- in the data frame format instead
- so definitely um i'm just going to go
- ahead and
- copy this cell so that we still have our
- you know good solution here and to show
- you the other way
- where we first create the empty data
- frame and then we want to assign the
- columns
- to do that you would have to have
- separate lists for
- each of your values so years salaries
- uh wealth and then you would do
- uh you know years dot append year
- you would do salaries
- uh dotted pen salary and you would do
- else without append wealth
- and then you would do df
- year equal years df
- wealth equals wealth
- and df salary equals
- salaries so this will produce the same
- exact
- data frame but this
- is definitely a little tedious to create
- individual lists for each one of those
- inputs
- and assign those individually when you
- can just
- do it all together here single list
- append the tuple
- just give it the column names and you're
- done and this
- will continue to scale for as many
- different things as you want to track at
- once in a single loop
- so now we have some logic which can
- produce
- our data frame let's wrap this up in a
- function
- so let's call this um get
- salaries wealth df
- and it takes the data and now we can
- just indent all this
- and add the return before the df
- and then when we call this um get
- salaries well cf
- on the model data then we can see we get
- the same
- data frame coming out of that and so now
- because we set that up with general data
- you can
- give it any values you want say your
- your high roller your starting salary
- starts
- at 100 000 now we can see
- the data frame of all those results over
- time
- based on whatever different
- values we want to give to it so that's
- why we make sure
- to
- structure everything in this way where
- it can take any arbitrary data
- and not always the original model data
- but anyway so now we can
- get a data frame which contains our
- salaries and wealth over time
- with a single command
- so then
- maybe say below this in the next cell
- we'll just define that here
- and then we can do uh df equals that
- on the model data
- so then again we have that data frame so
- the next thing that we might
- want to do or we definitely want to do
- if we're showing this off to the reader
- is to add some number formatting here uh
- we don't need we don't care about these
- decimals and
- we don't really want the scientific
- notation coming in here
- and these are all uh dollar amounts so
- it'd be nice to have the dollar sign
- on those values so
- um below this we can
- out what we're going to do so
- df.style.format
- then you pass the dictionary where you
- give it the column names
- and then you as the values
- is how you want to format that column
- so i want to put a dollar sign in front
- and i want to have zero decimal places
- and i want to have commas
- and i want that to be a fixed zero
- decimal places
- and then um want to do the same thing
- for the wealth the same format is going
- to be fine there
- as well
- so then when i do that then we can see a
- much nicer
- representation of those values over time
- another thing which we could possibly do
- here to help understand these results
- is to add the
- inline bar graph here um so we can do it
- we don't want it on the year column
- we just want it on the salary and wealth
- columns
- um and then we can see you know it makes
- it a lot easier to see those jumps for
- the promotions right we see this
- immediate break in
- the length of the bars representing that
- the salary or wealth
- jumped and
- we might also want to hide the index
- here
- the index is not useful here
- and now we've got a pretty nice display
- of these data
- so now what i recommend just always for
- our data frame
- styling is we want to wrap this in a
- styler function
- so let's
- style salaries wealth can be the name of
- the function
- and it takes a data frame and then i'm
- going to
- indent all of that and then i'm going to
- return the result of that
- so then that's defined
- so then we can
- uh below this then we're getting our
- data frame
- and then we want to style the
- data frame and that's what we're looking
- at
- so now two lines of code here off of any
- set of data that we want
- and we can get this nice styled
- uh representation of the salaries and
- wells over time
- so then the other thing that we want to
- do is
- plot our results so
- we have our original data frame still
- and this has the original values it's
- not the styled one you can't do any
- plotting on the styler object
- only on the data frame itself and we
- want to do a plot
- and let's do a line plot and we want our
- x to be the year and
- let's uh just plot the salary
- right now so there we can see the
- salaries
- over time now i could have
- plot both the salary and wealth on a
- single graph
- but we can see there's definitely a
- scale problem here
- where the wealth has a so much larger
- scale
- than the salaries and so it doesn't
- make a lot of sense to put these on the
- same graph so we can have the salaries
- and then we can have the wealth separate
- from that so
- we can plot the wealth here separately
- so line line plots are perfectly good
- for this you know you could
- do something like a bar or an area
- plot instead but i think the line
- is perfectly fine for this
- um and now we have
- a nice display of all of our
- information in the model
- so that's a an overview on how to add
- visualization
- to an existing python model using pandas
- and this uh concludes our lecture series
- on understanding complex results through
- visualization
- so thanks for listening and see you next
- time
Lab Exercises¶
Notes¶
Complete all the exercises in the Pandas and Visualization Labs Jupyter notebook
Resources¶
Transcript¶
- hey everyone
- nick dearbertus here teaching you
- financial modeling and today
- i'm going to be quickly going over the
- lab exercises
- for the section of the course on
- understanding complex results
- with visualization so the
- lab exercises here all center around the
- python side
- and they are represented here on slides
- in three different exercises
- ones related to getting started with
- pandas ones related to
- styling penis data frames and ones
- related to
- graphing with pandas now all of these
- are within the uh pandas and
- visualization
- labs that you can get from the course
- website
- or the link here in the slides
- and so that's all this jupiter notebook
- here
- pandas and visualization lab exercises
- so you'll want to complete all the
- exercises which
- are within here here we have four
- exercises
- on intro to pandas four on styling and 3
- on the graphing and they kind of go
- together in that
- you and a lot of them use the result of
- a prior one
- to continue forward so make sure to
- start from the beginning
- so that's a quick overview on the lab
- exercises
- for understanding complex results with
- visualization
- thanks for listening and see you next
- time