Understanding Complex Results

Learn how to analyze and communicate complicated results using visualization.

Introduction to Visualization


Notes

  • Visualization is a key modeling concept as often we have many different outputs to understand, but humans are terrible at getting understanding by looking at lots of numbers

  • Thoughtfully creating appropriate visualizations will allow someone to glance at your model and gain immediate understanding at a much richer level

  • Tables are a more primitive form of visualization which lay out the numbers in a better format, while charts/graphs can summarize a lot of numbers in one picture

  • For the most part, visualization in Excel is straightforward: insert chart and follow the prompts. Your numbers should already be in tables.

  • Python, being open-source and developed by the community, has a dizzying array of options for visualization. There is far more than you can do in Excel, including interactive plots, but it is generally a bit more complicated to work with

  • In this course, we will focus on Pandas (powered by matplotlib) to produce graphs simply

Transcript

  • 00:03: hey everyone
  • 00:04: nick duraburtis here teaching you
  • 00:05: financial modeling and today
  • 00:08: is the first lecture in our next lecture
  • 00:10: series on understanding complex
  • 00:12: results digging into visualization
  • 00:16: so the lecture today is an introduction
  • 00:19: to visualization just generally talking
  • 00:22: about
  • 00:22: why we want to do visualization when
  • 00:24: it's useful
  • 00:26: and the overview of what it looks like
  • 00:28: in both excel and python
  • 00:31: so we when we think about visualization
  • 00:35: uh it's a way of getting understanding
  • 00:39: of more than one number at once
  • 00:43: and in our models so far we've just had
  • 00:46: one main output
  • 00:47: thinking about the salary model the
  • 00:50: dynamic salary retirement model that
  • 00:51: we've built out we've had the number of
  • 00:53: years to retirement
  • 00:55: as our main output from the model
  • 00:58: but we've also had salaries over time
  • 01:01: and wealth over time as
  • 01:02: outputs but each of those represents a
  • 01:06: lot of different numbers
  • 01:07: and so it's a little bit difficult to
  • 01:10: just uh only with the numbers
  • 01:13: present that in a way that someone can
  • 01:16: very easily
  • 01:17: understand what it looks like over time
  • 01:20: at a quick glance so that's where
  • 01:23: visualization becomes useful
  • 01:25: is any time where you have multiple
  • 01:28: different
  • 01:28: numbers that you want to show some kind
  • 01:31: of summarization
  • 01:32: of that information and you want it to
  • 01:35: be
  • 01:36: in a much more digestible format and
  • 01:39: these visualizations
  • 01:40: can be very powerful for getting very
  • 01:43: quick understanding
  • 01:45: of complex results
  • 01:48: so you know they say a picture is worth
  • 01:51: a thousand words
  • 01:52: and it definitely is true uh in the
  • 01:55: context of
  • 01:56: visual visualizing our results you know
  • 01:59: humans are just
  • 02:01: really bad in general at looking at a
  • 02:03: bunch of numbers
  • 02:04: and making some kind of interpretation
  • 02:07: out of it
  • 02:08: that's something that machines are very
  • 02:09: good at but
  • 02:11: humans are bad at and so humans
  • 02:14: we're visual creatures and so we have to
  • 02:18: display our data in a visual way that
  • 02:20: makes sense to
  • 02:22: a human so thinking about the
  • 02:25: the way that we have our results so far
  • 02:28: uh in excel
  • 02:29: we have something like this which shows
  • 02:33: our salaries and wealth over time in a
  • 02:36: tabular
  • 02:37: format so that is kind of the more basic
  • 02:40: form of visualization is just to lay the
  • 02:43: numbers out in a table
  • 02:45: you already get more context than just
  • 02:48: displaying the numbers at least they're
  • 02:50: laid out in a structured way and
  • 02:53: we can see you know the salaries and
  • 02:55: wells together they're aligned by the
  • 02:57: time
  • 02:58: so that already helps substantially in
  • 03:00: understanding what's going on here with
  • 03:02: the numbers
  • 03:04: we had this in excel basically because
  • 03:07: you're always kind of working in tables
  • 03:09: in excel
  • 03:09: but we didn't even get to a table format
  • 03:12: in python yet in python we've just been
  • 03:14: printing out sentences which say
  • 03:16: you know at year three you would have 63
  • 03:19: 000 as a salary and so we haven't had a
  • 03:23: good way of displaying this information
  • 03:25: in python yet
  • 03:28: so looking at this table i mean
  • 03:30: certainly
  • 03:31: you can look at it and get some
  • 03:33: conclusions from it
  • 03:35: uh you know you can see looking at the
  • 03:38: salaries okay it's increasing a little
  • 03:39: bit year by year
  • 03:41: and then when we hit year five we have
  • 03:42: this jump here
  • 03:44: that jump representing a raise from a
  • 03:46: promotion
  • 03:48: uh versus the cost of living races that
  • 03:50: come
  • 03:51: every other year so
  • 03:54: you can definitely see that but it takes
  • 03:56: some time
  • 03:57: looking at the numbers to understand
  • 04:00: that
  • 04:01: you can't just immediately glance at
  • 04:02: this and understand okay every five
  • 04:04: years
  • 04:05: the there's a promotion the salary is
  • 04:07: going to jump for the promotion
  • 04:09: you can see that if you just look at the
  • 04:10: numbers one by one and identify these
  • 04:12: patterns but
  • 04:14: it's not immediately obvious just
  • 04:16: looking at the numbers
  • 04:17: and that's where visualization can be
  • 04:20: really helpful
  • 04:21: and we'll see examples of what this
  • 04:23: looks like
  • 04:24: visualized so when we think
  • 04:28: about how to visualize things in excel
  • 04:31: i mentioned that we're already working
  • 04:33: in tables we probably already have our
  • 04:35: numbers
  • 04:35: laid out in this kind of format so
  • 04:38: that's already
  • 04:39: going down the tabular direction of
  • 04:41: visualizing
  • 04:42: data so we have that in place
  • 04:45: then what we can add on top of that is
  • 04:47: charts and graphs and so
  • 04:49: in excel that really all lives in one
  • 04:52: spot
  • 04:52: you just go and hit insert chart and
  • 04:56: you pop into this kind of menu here and
  • 04:59: you just look through the different
  • 05:00: possible charts
  • 05:01: for your data and select the one that is
  • 05:04: appropriate
  • 05:06: and that's pretty much the end-all
  • 05:08: be-all of excel visualization
  • 05:11: there's quite a bit of customization
  • 05:13: that you can do within the charts but
  • 05:15: everything kind of lives within this
  • 05:17: insert chart
  • 05:18: and then modifying the chart that it
  • 05:20: generates
  • 05:23: in python things are not so
  • 05:25: straightforward
  • 05:27: there are a lot of options for how to
  • 05:30: visualize things in python and that's
  • 05:33: because
  • 05:34: python is an open source language which
  • 05:36: is developed
  • 05:37: by the community millions of people
  • 05:40: across the world
  • 05:42: uh are out there building different
  • 05:44: solutions for how to visualize
  • 05:46: anyone can go and create a new way to
  • 05:49: visualize things
  • 05:50: in python and so many people have done
  • 05:52: that
  • 05:53: and so you have all these different
  • 05:54: packages and all these different
  • 05:56: ecosystems
  • 05:58: of how to visualize things in python
  • 06:00: that have different advantages and
  • 06:01: disadvantages
  • 06:03: and the vast majority of this is
  • 06:05: definitely outside the scope
  • 06:07: of this class you know we could spend an
  • 06:09: entire semester
  • 06:10: just looking at visualization in python
  • 06:14: but we're trying to teach financial
  • 06:15: modeling here so we're going to take
  • 06:17: what is easy and can get us quickly
  • 06:21: to some reasonable charts and graphs and
  • 06:24: just kind of recognize that these other
  • 06:25: options are out there
  • 06:27: and you can expand into using them but
  • 06:29: you don't have to do that you can get
  • 06:32: pretty good results with just what we're
  • 06:34: focusing on this class
  • 06:36: which is going to be using pandas and
  • 06:39: matplotlib for our visualizations
  • 06:43: but there are a lot of other cool things
  • 06:44: you can do a number of these
  • 06:47: out here um are solving some
  • 06:50: interesting problems and presenting data
  • 06:52: in new and interesting ways
  • 06:53: such as interactive plots where you can
  • 06:56: actually like zoom in
  • 06:57: to the plot and maybe hover over points
  • 07:00: and it will show you more information
  • 07:02: about the point
  • 07:03: and other kinds of interactive features
  • 07:06: uh they're very
  • 07:07: very cool and very interesting ways to
  • 07:09: think about visualizing data that have a
  • 07:11: lot of advantages
  • 07:12: but we're not going to dig into all that
  • 07:15: in this course
  • 07:16: i would recommend you to take a look at
  • 07:19: some of those things
  • 07:19: outside this course such as bokeh and
  • 07:22: hollow views
  • 07:23: are to the ones that
  • 07:26: i've used for interactive plots and
  • 07:29: they're very
  • 07:30: useful but we're just going to focus on
  • 07:34: penis and matplotlib in this class
  • 07:38: and why are we focusing there so
  • 07:42: matplotlib is it was kind of one of the
  • 07:45: first
  • 07:46: ways to visualize data in python and
  • 07:50: a lot has been built around matplotlib
  • 07:52: as kind of a basic system
  • 07:54: for producing charts and graphs
  • 07:58: and matplotlib is going to kind of be in
  • 08:02: the background
  • 08:03: for us really we're going to use pandas
  • 08:06: to
  • 08:06: directly generate the plots and graphs
  • 08:10: whereas pennis actually uses matplotlib
  • 08:14: under the hood we generally don't need
  • 08:16: to think about that very much
  • 08:18: most of the time we're just using pandas
  • 08:21: but i do mention
  • 08:22: here that it's all based on my
  • 08:23: matplotlib because
  • 08:25: matplotlib is extremely customizable and
  • 08:29: you can do all the same customizations
  • 08:31: on your panties plots
  • 08:33: as you can with matplotlib so basically
  • 08:36: pandas is going to be our really simple
  • 08:38: way
  • 08:39: of just quickly creating some kind of
  • 08:41: graph to show off our data
  • 08:43: and then if you need to do absolutely
  • 08:45: anything with customizing it
  • 08:47: then you can go to using matplotlib to
  • 08:50: make those adjustments
  • 08:52: so most of the time you're not going to
  • 08:54: need to really think about matplotlib
  • 08:56: but if you want to customize your graph
  • 08:58: in some way
  • 08:59: then you can just google about how to do
  • 09:01: that in
  • 09:03: matplotlib and you'll be able to apply
  • 09:04: that to your pandas plot
  • 09:09: so that's the overview of financial
  • 09:12: modeling visualization and how we're
  • 09:14: going to focus in this course
  • 09:16: we'll come back next time to go through
  • 09:18: an example of
  • 09:19: doing visualization in excel so thanks
  • 09:23: for listening
  • 09:23: and see you next time

Visualization in Excel Example


Notes

  • Recommended Charts is a nice way to scan through a few possibilities which probably work well for your data, but take a look at All Charts if nothing seems right

  • Make sure that you have an appropriate title and axis titles for your chart so the reader immediately knows what it is about.

Transcript

  • 00:02: hey everyone
  • 00:03: nick dear bertis here teaching you
  • 00:04: financial modeling and today we're
  • 00:07: going to go over an example of doing
  • 00:09: visualization
  • 00:10: in excel as part of our lecture series
  • 00:13: on
  • 00:14: visualization and understanding complex
  • 00:16: results
  • 00:17: so we're going to work with the dynamic
  • 00:20: salary retirement model
  • 00:22: that we've built out in the first part
  • 00:24: of this course
  • 00:26: and we're going to now add visualization
  • 00:28: to the excel side
  • 00:30: of that model so i'm going to jump over
  • 00:33: to
  • 00:33: the excel for the dynamic salary
  • 00:36: retirement model
  • 00:39: so we already have our main output
  • 00:42: coming here used to retirement but it
  • 00:44: would be nice to be able to see here
  • 00:46: also
  • 00:46: what the salaries look like over the
  • 00:48: time what do the wealth look like
  • 00:50: over time and being able to see that
  • 00:53: very quickly in a very digestible format
  • 00:57: of course we could just bring over
  • 01:00: the salaries and the wealth into a table
  • 01:02: and
  • 01:03: show that on the inputs and outputs tab
  • 01:06: as well but it's not going to be easy to
  • 01:09: just glance
  • 01:10: at these numbers and understand patterns
  • 01:13: in those numbers over time
  • 01:15: it would take a lot of really closely
  • 01:19: looking at the numbers to understand
  • 01:20: those patterns
  • 01:22: so instead let's produce some
  • 01:24: visualizations of those numbers
  • 01:26: so looking at the salary we want to make
  • 01:30: a plot of the salary over time
  • 01:32: and all of that is going to come from
  • 01:36: insert and then this charts section of
  • 01:38: the insert
  • 01:40: tab in the ribbon and the first thing
  • 01:44: that we want to do
  • 01:44: is we want to highlight the data that we
  • 01:47: want to plot
  • 01:48: so we want to plot the salaries we don't
  • 01:50: care
  • 01:51: about any of these intermediate
  • 01:53: calculations we just want the time
  • 01:56: and the salaries so one way to go about
  • 01:59: that
  • 02:00: is to just select those two columns
  • 02:02: before you go to create the chart
  • 02:05: so the easiest way to do that start over
  • 02:08: here
  • 02:09: and then on windows it's going to be
  • 02:11: holding shift and
  • 02:12: control on mac i believe it's shift and
  • 02:15: command
  • 02:16: and then you press down and it's going
  • 02:18: to highlight that entire column for you
  • 02:21: and then you can come over
  • 02:23: here to the salary cell hold control
  • 02:26: that would be command on mac
  • 02:28: and click and now you can see we've got
  • 02:30: this whole column as well
  • 02:31: as this cell and now that our last
  • 02:34: selection was on this cell we can do
  • 02:36: that same
  • 02:36: shift and control or command trick hit
  • 02:39: down again
  • 02:40: and now we have both of these two
  • 02:42: columns selected
  • 02:44: so from there i always like to go to
  • 02:47: recommended charts
  • 02:48: first because that will usually tell you
  • 02:51: the best charts
  • 02:53: for your particular data
  • 02:57: and you can look through the recommended
  • 02:59: charts for what you think
  • 03:01: is appropriate for your data
  • 03:04: but then if you don't see something that
  • 03:06: you like then you can go over to this
  • 03:08: all charts tab
  • 03:10: and explore through all the different
  • 03:12: options that you have
  • 03:15: but here the recommended chart did bring
  • 03:18: up
  • 03:18: a nice representation for this data
  • 03:20: which is a simple
  • 03:22: line chart so here we can see the
  • 03:24: salaries
  • 03:25: over time going up so let's go ahead and
  • 03:30: hit ok
  • 03:30: to insert that chart so now we can see
  • 03:33: we have this chart over here
  • 03:37: now i said that we wanted to have the
  • 03:40: chart over on the inputs and outputs tab
  • 03:42: but now i have it on the salary tab
  • 03:44: because that's where the salaries were
  • 03:46: so the reason that i made it over here
  • 03:49: is because it's very easy now to just
  • 03:51: click this
  • 03:53: and i'm going to hit control x that
  • 03:56: would be command x on mac
  • 03:58: you can see it disappears because i have
  • 04:00: now cut
  • 04:01: the plot and now i can paste it
  • 04:04: controller command v over here and
  • 04:06: everything stays
  • 04:08: linked together so you know now if i
  • 04:10: change
  • 04:11: how often the promotions are then you'll
  • 04:13: see that immediately changing in the
  • 04:15: graph as well
  • 04:16: everything is still linked together
  • 04:19: and this i think is generally easier
  • 04:22: than trying to
  • 04:23: start a chart on this page and then
  • 04:24: reference over
  • 04:26: to another page to get the data
  • 04:29: uh one other way that we could have
  • 04:31: selected the data
  • 04:32: to produce the plot is just to select
  • 04:35: everything
  • 04:36: and then uh insert the chart and then
  • 04:40: just let it be created and then after
  • 04:43: it's been created
  • 04:44: you can then see uh
  • 04:48: here where we can actually adjust the
  • 04:50: data which is included
  • 04:52: so now same way it also only includes
  • 04:55: the salary
  • 05:00: so now we have this chart of salary and
  • 05:04: we can see the salary over time this is
  • 05:07: already a lot more clear what's going on
  • 05:09: we can see those discrete jumps
  • 05:11: every time that there's a promotion
  • 05:14: one thing that's not so clear just
  • 05:16: looking at this is
  • 05:18: what what this axis means
  • 05:22: since the plot says this is salary it's
  • 05:25: fairly clear that this is a line of
  • 05:26: salary
  • 05:27: and these values represent the amount of
  • 05:29: the salary
  • 05:30: but here it's not clear what these
  • 05:32: numbers mean so
  • 05:34: you definitely want to add an axis title
  • 05:37: for this
  • 05:38: so this plus over here that comes up
  • 05:40: whenever you click on the graph
  • 05:42: is how you can easily add additional
  • 05:45: elements to the chart and here the
  • 05:46: element we want to add
  • 05:48: is going to be a primary horizontal axis
  • 05:52: title so we add that and then we can
  • 05:55: double click in here
  • 05:57: to change out the name of that and this
  • 06:00: is going to be
  • 06:02: uh time in years
  • 06:06: and that way it's a lot more clear what
  • 06:08: this actually
  • 06:09: represents so
  • 06:13: now this looks like a pretty good plot
  • 06:14: for salary
  • 06:16: but maybe we wanted to display this in a
  • 06:18: different way
  • 06:19: so it is possible to change the chart
  • 06:21: type after you've created it
  • 06:23: also uh it's possible to copy plots
  • 06:27: so that you can you know play around
  • 06:29: with different versions
  • 06:30: until you settle on something that you
  • 06:32: like so i'm going to go ahead and just
  • 06:34: copy ctrl
  • 06:35: c and then paste ctrl v this plot
  • 06:39: now we can see we have the same exact
  • 06:41: plot everything
  • 06:43: is still going to be linked together to
  • 06:45: the model
  • 06:48: and now we can click on this plot and we
  • 06:51: can go to the chart design tab
  • 06:53: and that allows us a few different
  • 06:55: things here
  • 06:56: one it's very easy to pick a different
  • 06:58: style
  • 06:59: for the plot just up here
  • 07:03: and the other is uh you can change the
  • 07:06: chart type here as well
  • 07:08: so you know instead of a line maybe we
  • 07:10: want a scatter plot
  • 07:12: with a with connected
  • 07:15: lines then it's more clear that we have
  • 07:17: observations for each year
  • 07:19: and not necessarily you know completely
  • 07:22: filled out
  • 07:23: in this axis um or maybe
  • 07:26: instead you wanted to represent this
  • 07:29: with
  • 07:32: maybe something like
  • 07:36: columns instead that can get at the same
  • 07:40: exact
  • 07:40: kind of concept so it's
  • 07:44: easy after you've already created the
  • 07:45: chart to go back and switch
  • 07:47: the type of chart as well
  • 07:52: so now that we've seen how to do that
  • 07:54: let's go ahead
  • 07:55: and bring over the chart for
  • 07:58: wealth as well so coming over to the
  • 08:01: wealth tab
  • 08:02: then we just do the same thing that we
  • 08:05: did
  • 08:05: previously we're just going to
  • 08:08: grab these two columns here the time
  • 08:12: column
  • 08:12: and the wealth column once again to
  • 08:15: insert
  • 08:16: recommended charts here and let's go
  • 08:19: with the line chart here
  • 08:21: and we can see also the wealth over time
  • 08:25: i'm going to add that title for the
  • 08:28: horizontal axis that this is
  • 08:32: time and years
  • 08:36: and then going to cut this so we can
  • 08:39: bring it over
  • 08:40: to the inputs and outputs tab
  • 08:43: so now we have both the salary and
  • 08:45: wealth and of course you can
  • 08:49: you know drag these to change the size
  • 08:51: so that everything will fit
  • 08:53: on the screen appropriately
  • 08:56: and now we can see everything will
  • 08:59: change together
  • 09:00: so it's a much easier way to get a quick
  • 09:02: overview of what's going on in the model
  • 09:06: and one other way we could have gone
  • 09:08: about this is
  • 09:10: right now we have separate graphs for
  • 09:12: the salary and the wealth
  • 09:14: we could have potentially combined them
  • 09:16: into
  • 09:17: the same plot because they have the same
  • 09:20: axes right they're both
  • 09:22: over time they're both talking about a
  • 09:24: dollar amount
  • 09:25: and the y-axis so it could make sense to
  • 09:28: have them on the same plot
  • 09:30: now the reason that i didn't do that is
  • 09:32: because
  • 09:33: the scale of these axes is quite
  • 09:36: different
  • 09:36: for for the x it's the same but for the
  • 09:39: y
  • 09:40: you can see the wealth axis here is
  • 09:42: actually 10 times as large
  • 09:44: as the salary axis so if we plot these
  • 09:47: two
  • 09:48: on the same plot then we're barely going
  • 09:51: to be able to see
  • 09:52: the salary line so we can just quickly
  • 09:54: see
  • 09:55: uh the example of that if i just grab
  • 09:58: the salary and the wealth uh here
  • 10:02: to put onto one chart that
  • 10:05: um we create that chart and it basically
  • 10:08: looks like the salary doesn't really do
  • 10:10: anything
  • 10:11: it increases but it looks almost like
  • 10:13: it's totally linear you can barely tell
  • 10:15: that the promotions are going on in
  • 10:16: there
  • 10:17: whereas the wealth just totally eclipses
  • 10:19: that by the end
  • 10:20: so um doesn't really make sense to put
  • 10:24: these two
  • 10:24: on the same plot when it's so much more
  • 10:26: clear what's going on
  • 10:28: when they're on separate plots because
  • 10:30: of the different scale
  • 10:32: of the axes so that's
  • 10:35: an overview of how to do visualization
  • 10:38: in excel
  • 10:39: next time we're going to come back and
  • 10:41: learn about
  • 10:42: pandas and python which is going to give
  • 10:46: us a table kind of data structure and
  • 10:48: we're going to start working towards
  • 10:49: visualization
  • 10:50: in python so thanks for listening and
  • 10:53: see you next time

Introduction to Pandas


Notes

  • We will be using pandas to produce tables and graphs in this course though the custom DataFrame type

  • You will also find these DataFrames useful for general problem-solving purposes. Many use them as a primary way to store and work with data in their models

  • Pandas does far more than we will cover in the course. It is the top Python package for manipulating and analyzing data. I use it extensively on a daily basis.

  • In this course, with Pandas we will focus on loading and exporting data, doing math, other basic operations and summarizations, and presenting data in a tabular format

Transcript

  • 00:02: hey everyone
  • 00:03: nick duraburtis here teaching you
  • 00:05: financial modeling and today
  • 00:07: we're going to do an introduction to
  • 00:09: pandas
  • 00:11: and this is part of our lecture series
  • 00:13: on understanding complex
  • 00:15: results using visualization
  • 00:18: so we want to get to be able to
  • 00:21: visualize our results in python
  • 00:24: but there's a little bit more that we've
  • 00:25: got to learn before we can get there
  • 00:28: so in python um
  • 00:31: we talked about how there's lots of
  • 00:33: different options
  • 00:34: for how we can visualize the data
  • 00:38: but we're going to focus on using pandas
  • 00:41: which uses matplotlib under the hood to
  • 00:44: do our
  • 00:45: visualizations so we've got to go and
  • 00:47: learn about how to use pandas
  • 00:49: in the first place because it does
  • 00:52: a lot more than just plotting
  • 00:56: and tables um
  • 00:59: but we gotta learn panda's basics
  • 01:02: and the basics really resolve around
  • 01:06: the pandas data frame so the data frame
  • 01:10: is this new type of object that we
  • 01:13: haven't worked with before
  • 01:15: you know we've worked with lists numbers
  • 01:17: strings
  • 01:18: all the basic kinds of data types
  • 01:21: uh but i've mentioned how anyone can
  • 01:24: create
  • 01:25: their own data type by writing a class
  • 01:29: and lots of third-party packages out
  • 01:32: there
  • 01:33: create these custom classes for you to
  • 01:36: use
  • 01:36: which give all sorts of functionality so
  • 01:39: that has been
  • 01:40: done in the panus library they've
  • 01:41: created the data frame
  • 01:43: class and so that defines this data
  • 01:45: frame type
  • 01:47: that we can use and when you think of a
  • 01:50: data frame
  • 01:51: think of basically a table so
  • 01:55: you know with lists numbers
  • 01:58: all the dictionaries all the data types
  • 02:00: we've thought about so far in python
  • 02:02: nothing really is like a table
  • 02:06: but we have these data frames that can
  • 02:08: fulfill that role for us
  • 02:11: so before we even get into the graphing
  • 02:13: side we've got to learn about how to
  • 02:15: work with data frames
  • 02:16: with these tabular representations of
  • 02:19: the data
  • 02:20: and there's quite a lot that we can do
  • 02:22: there
  • 02:23: and i will really become a basic tool
  • 02:27: in your toolkit for solving problems in
  • 02:30: python
  • 02:31: a lot of people use data frames
  • 02:34: as a very fundamental building block in
  • 02:36: their models as a way of storing and
  • 02:38: working with
  • 02:39: data so what we're learning here has a
  • 02:42: lot of applications beyond even just the
  • 02:45: visualization
  • 02:48: so what is this data frame thing that
  • 02:51: we're about to learn about
  • 02:53: again essentially you can just think of
  • 02:54: it as a table
  • 02:56: it has rows it has columns but there's a
  • 02:59: lot that we can
  • 03:00: do with them so some of the features
  • 03:03: that
  • 03:04: data frames have you can
  • 03:07: you know even after it's created you can
  • 03:08: add or remove rows and
  • 03:10: rows and columns you can aggregate
  • 03:13: that data in a lot of different ways
  • 03:15: with summary statistics and grouping by
  • 03:17: different things
  • 03:20: you can go to and from different
  • 03:23: data formats like excel files you can
  • 03:26: read in excel files export to excel
  • 03:28: files
  • 03:29: as well as working with databases
  • 03:32: and lots of other output formats
  • 03:37: you can take multiple different data
  • 03:38: sets and you can put them together by
  • 03:40: joining and merging and concatenating
  • 03:43: you can re re-sample and reshape
  • 03:47: your data thinking about you know
  • 03:51: different frequencies you have monthly
  • 03:52: data you want to take it to
  • 03:54: annual data or
  • 03:57: other ways of reformatting your data
  • 04:01: you can slice and dice and query from
  • 04:04: your data
  • 04:05: in any sorts of ways maybe
  • 04:08: you have data on countries and you just
  • 04:10: want to get the data for the us well you
  • 04:12: can just query for that
  • 04:14: us data and you can deal with
  • 04:19: different patterns in your data such as
  • 04:22: duplicates
  • 04:23: you can remove those duplicates you can
  • 04:26: remove missing values you can fill in
  • 04:28: missing values
  • 04:30: lots of different things you can do to
  • 04:31: manipulate the data
  • 04:33: which you have in your data frames
  • 04:36: so this is why i mentioned that this is
  • 04:38: really a fundamental
  • 04:39: building block for most people as they
  • 04:42: build out their models
  • 04:44: because there's so much that you can do
  • 04:45: with it and this is
  • 04:47: really the gold standard library for
  • 04:50: working with tabular data in python it's
  • 04:53: very very popular
  • 04:55: it's used all across finance and the
  • 04:57: data science
  • 04:59: industries so most people who
  • 05:02: work in python are at least familiar
  • 05:04: with pandas
  • 05:06: and a lot of them use it on a daily
  • 05:08: basis i definitely do
  • 05:12: so what does this data frame basically
  • 05:14: look like how do we create one how do we
  • 05:16: work with it
  • 05:17: so here's a very simple example
  • 05:20: of how you can create a data frame we'll
  • 05:23: talk about
  • 05:24: a few different ways that we're going to
  • 05:26: create data frames in this course
  • 05:28: but here's what i think is the easiest
  • 05:31: way to get started with one
  • 05:32: is of course this is a third party
  • 05:36: package
  • 05:36: that we're using so we do have to import
  • 05:39: that so
  • 05:40: we're going to import pandas and the
  • 05:42: convention that everybody uses
  • 05:45: if you google anything about pandas
  • 05:47: you're going to see people using the
  • 05:48: same convention
  • 05:49: is to import pandas as pd and then you
  • 05:52: always do pd dot
  • 05:53: whatever you want to use from pandas so
  • 05:56: to create this data frame it's going to
  • 05:58: be
  • 05:58: pd.data frame and we're going to assign
  • 06:01: that to a df
  • 06:02: variable so that we can use that going
  • 06:05: forward
  • 06:07: now i had everyone install python
  • 06:10: with anaconda in this class and so
  • 06:13: pandas
  • 06:14: is already installed within anaconda so
  • 06:17: you don't have to go
  • 06:18: and install that but if you
  • 06:21: did not use anaconda you have some other
  • 06:24: python distribution that does not have
  • 06:26: pandas included then you will have to go
  • 06:28: and
  • 06:29: install that package before you can use
  • 06:31: it but as long as you installed anaconda
  • 06:33: it should be there already for you
  • 06:37: so we have this data frame and right now
  • 06:40: it's uh an empty data frame we didn't
  • 06:43: give it any data to start with
  • 06:45: so that's where we are as of here an
  • 06:47: empty data frame no data in it
  • 06:49: so then what we can do is we can add
  • 06:51: some columns to this data frame
  • 06:53: so we're going to add a sales column
  • 06:55: with these values
  • 06:57: and we're going to add a category column
  • 06:59: with these values
  • 07:01: and then when we look at the data frame
  • 07:02: we'll see something like this
  • 07:05: a tabular representation of the data
  • 07:08: where we have
  • 07:09: sales and category as our columns and
  • 07:12: then we have the rows of this data
  • 07:14: that we passed in so that's a basic way
  • 07:18: of creating a data frame first you make
  • 07:20: an empty data frame
  • 07:21: then you assign the columns one by one
  • 07:25: and we'll look at two other ways that we
  • 07:28: generally make
  • 07:28: data frames uh being that you can create
  • 07:33: it all at once with the data
  • 07:35: all within a single command
  • 07:38: and the other main way is to load from
  • 07:40: some kind of external source like
  • 07:41: reading in an excel file to create your
  • 07:44: data frame
  • 07:47: so then let's jump over to the jupyter
  • 07:50: notebook example on how we can
  • 07:55: work with these data frames
  • 07:58: so
  • 08:02: we're over here on the intro to pandas
  • 08:05: and
  • 08:06: visualization notebook and
  • 08:10: so first the thing that we're gonna do
  • 08:13: as always is we're gonna import what we
  • 08:15: need so we're gonna import
  • 08:17: pandas as pd so that we can use that
  • 08:20: throughout and so the first way that
  • 08:23: we're talking about creating a data
  • 08:24: frame
  • 08:25: we create it first assign columns later
  • 08:27: so here
  • 08:28: you make the empty data frame and if you
  • 08:30: look at the empty data frame
  • 08:32: you basically see nothing because
  • 08:34: there's no data in there
  • 08:37: then we can go and we can assign a
  • 08:39: column to it
  • 08:40: so this is going to be assigning the
  • 08:42: name column
  • 08:44: and it's giving the values joe jim and
  • 08:46: mary
  • 08:47: so then when we run that then we see
  • 08:49: this table
  • 08:50: representation with the column name name
  • 08:54: and the values joe jim and mary
  • 08:59: and
  • 09:03: the reason that we use brackets here
  • 09:07: whereas you know before we've just
  • 09:08: looked at brackets being able to look up
  • 09:11: something in a dictionary
  • 09:12: or look up something in a list
  • 09:17: classes in python define the way you
  • 09:20: work with objects
  • 09:21: and the way that pandas has defined
  • 09:23: their data frame class
  • 09:25: is why you work with it in this way so
  • 09:28: it's all
  • 09:28: in the particulars of the implementation
  • 09:31: of the class
  • 09:35: so in order to get something useful in
  • 09:37: this data let's assign a few more
  • 09:39: columns
  • 09:40: uh so for each of these people now we
  • 09:43: have a weight
  • 09:44: uh we have a price that they're willing
  • 09:47: to pay
  • 09:47: for some good the reservation price and
  • 09:51: we have a percentage of the time that
  • 09:53: they
  • 09:54: spend doing activity like outdoor
  • 09:58: activity or something like that
  • 10:01: so now we can see all of that laid out
  • 10:03: in a table
  • 10:05: in a nice clean way
  • 10:08: so the other main way to create a data
  • 10:10: frame is give it all the data at once
  • 10:13: give it all the columns at once and
  • 10:17: it's definitely a little more
  • 10:18: complicated syntax to do that
  • 10:21: but it can be useful in some cases
  • 10:24: so here we're passing
  • 10:28: to the data frame a list of tuples
  • 10:31: so here's the outer list and then each
  • 10:33: item in the list
  • 10:34: is itself a tuple
  • 10:38: and each of those tuples has the data
  • 10:41: for a single row in the data frame
  • 10:45: so here's all of joe's values here's all
  • 10:47: of jim's values here's all of mary's
  • 10:49: values
  • 10:51: and then after this first argument of
  • 10:54: the list of tuples of the data
  • 10:56: then it's a comma and then we can say
  • 10:59: columns equals
  • 11:00: and we can give it a list of the names
  • 11:03: of the columns
  • 11:04: for the data frame so those columns come
  • 11:08: directly
  • 11:08: to be the names of the columns and so
  • 11:12: doing this we can create the same data
  • 11:15: frame that we made above
  • 11:16: by assigning the columns individually
  • 11:19: just all at once
  • 11:20: in a single command
  • 11:26: so now we have uh this data frame which
  • 11:29: has a few different people
  • 11:31: and some characteristics of those people
  • 11:33: how do we
  • 11:34: now just saying we have that data frame
  • 11:36: ready how do we pull out what we want
  • 11:38: from that data frame so
  • 11:41: similar to a dictionary we can put the
  • 11:44: name of the column
  • 11:46: as a string into the brackets and that
  • 11:49: will pull that column
  • 11:51: out of the dictionary so this
  • 11:53: reservation price
  • 11:54: each of these values have now come out
  • 11:57: when we access
  • 11:58: the reservation price
  • 12:03: and what we have here is actually a
  • 12:05: series
  • 12:06: a series is the other main
  • 12:10: class within pandas and it represents a
  • 12:12: single row or a single column
  • 12:15: of a data frame
  • 12:18: so that's how we get one column
  • 12:21: you can also select multiple columns at
  • 12:23: a time
  • 12:24: by passing it the list of the columns
  • 12:28: that you want so do notice that we have
  • 12:31: the double brackets here
  • 12:32: because the outer bracket means i want
  • 12:34: to look something up in the data frame
  • 12:37: and then the inner bracket means i want
  • 12:38: to look up this list
  • 12:40: of columns so if you omit
  • 12:43: that uh brackets the inner brackets
  • 12:47: then it's not going to work
  • 12:48: appropriately you're going to get a key
  • 12:50: error
  • 12:50: because it's trying to now look up a
  • 12:53: single column
  • 12:54: with the name of this whole thing which
  • 12:57: doesn't exist
  • 12:58: you do need to have that second set of
  • 13:00: brackets so that it's saying
  • 13:01: i want to look up a list of columns and
  • 13:05: give me each of those columns
  • 13:06: individually
  • 13:09: so that's selecting columns now how do
  • 13:11: we select
  • 13:12: rows so there's uh
  • 13:16: this i look we can do on the data frame
  • 13:19: that's integer location so that's
  • 13:22: just uh saying zero would be saying give
  • 13:25: me the first row
  • 13:26: once again zero based indexing as with
  • 13:29: nearly everything in python
  • 13:31: so zeros give me the first row so we can
  • 13:34: see
  • 13:35: again this is another series that we're
  • 13:38: getting
  • 13:39: because any individual row or column of
  • 13:42: data frame is a series
  • 13:44: and that's why it displays a little bit
  • 13:45: differently um
  • 13:48: but we can see we have all of joe's
  • 13:50: values
  • 13:51: joe had the 150 weight and that's what
  • 13:54: we're indeed
  • 13:55: getting here so and you know if we went
  • 13:58: to one then that would be
  • 14:00: getting jim's values and two gets merry
  • 14:02: values
  • 14:04: and if we try to go above that we would
  • 14:06: get an index error because there are not
  • 14:08: that many rows
  • 14:09: in the data frame and that's where you
  • 14:11: get this single position indexers out of
  • 14:13: bounds
  • 14:14: you know you should see index error out
  • 14:16: of bounds you should know that you're
  • 14:18: trying to go
  • 14:19: further than exists in the data frame
  • 14:25: we can also pull out something by both
  • 14:28: rows and columns
  • 14:30: so that we can do with loc so
  • 14:34: loc uh you can give it two arguments
  • 14:37: in contrast to the ilok and so with loc
  • 14:42: we tell it first what row we want to
  • 14:44: look up and then what column
  • 14:46: we want to look up so we're going to
  • 14:47: look up the first row
  • 14:49: and we're going to look up the
  • 14:50: reservation price column so here's the
  • 14:52: data frame again just for reference
  • 14:54: so we want to go to the first row here
  • 14:57: and we want to look up the reservation
  • 14:58: price column
  • 14:59: so that's why we get the single value of
  • 15:02: 10.12
  • 15:05: and then we can also query for whatever
  • 15:07: rows or columns of the data frame that
  • 15:09: we want
  • 15:10: so here we're going to get any rows
  • 15:13: which have a reservation price
  • 15:15: which is less than 14. so we can see the
  • 15:18: row
  • 15:19: the gym row which had a 15 reservation
  • 15:22: price is no longer here in this
  • 15:24: query result of the data frame but we do
  • 15:27: have the other rows because their
  • 15:28: reservation price was less than 14.
  • 15:32: um so you can kind of read this syntax
  • 15:34: as
  • 15:35: give me the data frame where
  • 15:38: uh the data frames reservation price
  • 15:42: is less than 14.
  • 15:45: so you do have to like repeat this data
  • 15:47: frame variable
  • 15:49: uh you can't just do something like
  • 15:52: that it's not going to understand that
  • 15:55: um
  • 15:56: it does have to be look up the data
  • 15:59: frame
  • 15:59: where the data frames reservation price
  • 16:03: is less than 14 and then it can work
  • 16:11: so we can do multiple queries
  • 16:14: at once as well you just have to
  • 16:18: separate them with an ampersand and put
  • 16:21: parentheses
  • 16:22: around each one of your queries so
  • 16:26: here we're doing what we did before and
  • 16:28: getting the reservation prices which are
  • 16:29: less than 14.
  • 16:31: but we're also going to get only those
  • 16:34: rows
  • 16:35: which have the percentage active greater
  • 16:38: than
  • 16:38: sixty percent so joe had sixty percent
  • 16:41: that's not greater than sixty percent
  • 16:44: and so that's why when we run this we're
  • 16:46: left only with
  • 16:47: the mary row that's the only one that
  • 16:49: satisfies
  • 16:50: all of the conditions that we've given
  • 16:52: to it
  • 16:53: but it's very important to include the
  • 16:55: parentheses if you omit any of those
  • 16:58: then it's not going to work
  • 17:00: appropriately
  • 17:01: you don't need them in this special case
  • 17:04: of just doing a single query
  • 17:05: they don't hurt but you don't need them
  • 17:08: whereas with multiple queries you
  • 17:10: definitely do need them
  • 17:18: um and then we can pass these same kind
  • 17:21: of queries
  • 17:22: into the loc command as well
  • 17:26: and for that um
  • 17:29: we do the same kind of syntax for the
  • 17:32: row part
  • 17:33: of the loc and then we can also pass it
  • 17:36: whatever columns that we want to get
  • 17:38: so we can take this same exact query
  • 17:40: that we had here to get this
  • 17:41: mary row and that comes as the first
  • 17:44: argument
  • 17:45: in loc and then the second argument is
  • 17:48: what columns we want to get
  • 17:50: so here we're getting just the name and
  • 17:51: the weight columns and that's why we see
  • 17:54: just mary's name and weight
  • 17:56: and you also could give it just a single
  • 17:59: column as well
  • 18:01: and that is going to just give you that
  • 18:03: value
  • 18:07: so that's selecting things out of data
  • 18:10: frames
  • 18:11: let's look at some basic math we can do
  • 18:13: so
  • 18:14: you can take an entire column at once
  • 18:18: from a data frame and do math with it
  • 18:20: which definitely simplifies things
  • 18:22: whereas before we would have always had
  • 18:24: to create loops
  • 18:25: to be able to apply these operations
  • 18:28: across all the different values of our
  • 18:29: data
  • 18:30: here it's one simple expression in
  • 18:33: pandas
  • 18:34: so we can just add 10
  • 18:37: to the reservation price column and we
  • 18:39: get a new column which has 10
  • 18:41: added to each of the values we can also
  • 18:44: do math with multiple different columns
  • 18:46: of the data frame
  • 18:47: we could multiply this price by the
  • 18:49: percentage active
  • 18:51: and get the result of that here
  • 18:54: and we can um
  • 18:57: any data frame which has just numbers we
  • 19:00: can
  • 19:00: do math with that as well so here
  • 19:04: taking the reservation price and weight
  • 19:06: columns out of the data frame
  • 19:08: we can multiply those both by 10 all at
  • 19:11: once
  • 19:12: um and
  • 19:15: you can also you know take the result of
  • 19:18: these things and assign them back
  • 19:19: into the data frame as as new columns
  • 19:22: if you would like
  • 19:26: we can also do some summary statistics
  • 19:29: very easily
  • 19:30: with data frames so
  • 19:33: dot describe kind of gives you the
  • 19:35: overview of all the different
  • 19:36: summary statistics you get how many
  • 19:40: there are of each you get the average
  • 19:42: standard deviation minimum different uh
  • 19:44: percentiles
  • 19:47: or you can do each these things
  • 19:49: individually so
  • 19:51: you know dot mean to get the averages
  • 19:53: dot std to get the standard deviations
  • 19:56: dot quantile to get here would be the
  • 19:59: 50th percentile
  • 20:00: which is the median and you can do
  • 20:03: whatever percentile that you want in
  • 20:05: this
  • 20:07: min to get the minimum max to get the
  • 20:10: maximum
  • 20:12: and any of these operations you can
  • 20:15: apply it across the row instead of the
  • 20:18: column
  • 20:19: if you would like um and so
  • 20:22: this is taking an average of all the
  • 20:25: numerical values
  • 20:26: in a given row and so this is saying
  • 20:29: this is the average of joe's values
  • 20:31: this is the average of james values and
  • 20:33: so on
  • 20:34: and so you just pass this axis equals
  • 20:37: one argument
  • 20:38: to one of these summary functions to be
  • 20:41: able to do that
  • 20:43: and axis equals one just means work over
  • 20:45: the rows instead of working over the
  • 20:47: columns
  • 20:51: and then you know these are all the kind
  • 20:53: of built-in things but panus also has a
  • 20:55: way that
  • 20:56: you can apply any function you want
  • 20:59: across
  • 20:59: a data frames values so here's another
  • 21:02: application where
  • 21:03: creating functions for all our different
  • 21:05: logical steps becomes very useful
  • 21:07: because then we can take this function
  • 21:09: we created and apply it
  • 21:11: to every single cell in the data frame
  • 21:14: um
  • 21:14: and so here this is a simple function
  • 21:17: that just takes the value
  • 21:18: and multiplies it by 100 and returns the
  • 21:20: result
  • 21:22: and it's this apply map that is able to
  • 21:26: take a function and apply it
  • 21:29: to each individual cell in the data
  • 21:31: frame
  • 21:32: and return a new data frame which has
  • 21:34: the result of all those calculations
  • 21:38: so we run that and we can see we get 100
  • 21:42: times each of the values that were there
  • 21:44: before
  • 21:45: and even for the strings we get uh those
  • 21:48: repeated 100 times
  • 21:50: so it was able to take this and apply it
  • 21:52: to all the different
  • 21:53: cells in the data frame now one thing
  • 21:56: that you'll notice
  • 21:57: is there's no open close parenthesis
  • 22:01: and passing some kind of number or
  • 22:03: arguments that you would expect to
  • 22:05: normally see
  • 22:06: with a function and that's because
  • 22:09: when we um
  • 22:12: do that we're calling the function right
  • 22:14: so we pass it a 5 and we get 500
  • 22:17: and now the result of this is just that
  • 22:19: 500
  • 22:20: this function has been called it's been
  • 22:22: evaluated it's gone now we're just left
  • 22:24: with 500.
  • 22:26: when we don't do the parentheses that's
  • 22:29: the function
  • 22:30: a reference to the function itself so
  • 22:32: that's why we see
  • 22:33: what is this thing it's a function it's
  • 22:36: the function multiplied by hundred
  • 22:39: and so with this structure we're passing
  • 22:41: the function itself
  • 22:43: into this other function apply map
  • 22:46: so that apply map can take this and
  • 22:48: apply it to each of the cells we have to
  • 22:50: pass the function
  • 22:51: itself and not the result of calling
  • 22:54: that function so this definitely is a
  • 22:57: concept that a lot of people struggle
  • 22:59: with
  • 22:59: in the beginning uh but we want to pass
  • 23:03: the function itself and so we don't put
  • 23:05: the parentheses
  • 23:06: whereas essentially other every other
  • 23:08: use case of functions
  • 23:09: you do want to call it and get the
  • 23:11: result of that
  • 23:13: and so you use the parentheses and pass
  • 23:16: whatever arguments
  • 23:19: so that's a quick overview on pandas
  • 23:22: and data frames and we're going to come
  • 23:24: back
  • 23:25: next time to look at some ways that we
  • 23:28: can style
  • 23:30: our data frames to make them look better
  • 23:33: so thanks for listening and see you next
  • 23:35: time

Styling Pandas DataFrames


Notes

  • Just as it is important to format tables in Excel to increase readability, we should do the same with any Pandas DataFrames we display to the reader of the model

  • There is a philosophical difference in how styling is done in Excel versus Pandas. In Excel, you directly format the table which stores your data. In Pandas, you create a styled object immediately before displaying which is separate from the original data, the data itself does not get formatted

  • Because of this difference in philosophy, the way I recommend working with Pandas styling is to create a styler function that accepts a DataFrame and returns the styled object. This way you can just call it on your DataFrame as you display it. This has a couple advantages: your data logic is completely separate from formatting code, and you can apply consistent formatting to multiple different DataFrames easily.

Transcript

  • 00:02: hey everyone
  • 00:03: nick dear versus here teaching you
  • 00:05: financial modeling today we're going to
  • 00:07: be talking about
  • 00:08: styling pandas data frames as part of
  • 00:11: our lecture series on
  • 00:12: understanding complex results with
  • 00:14: visualization
  • 00:16: so we learned last time about these
  • 00:19: pandas
  • 00:19: data frames which
  • 00:22: let you get a tabular representation of
  • 00:26: your data
  • 00:27: in python and
  • 00:31: we just had kind of the basic default
  • 00:33: way that these things looked
  • 00:35: and now we're going to learn how we can
  • 00:37: add our own custom
  • 00:39: styles and formatting to our data frames
  • 00:44: so where this is important is you
  • 00:46: definitely want to have
  • 00:47: number formatting when you display
  • 00:50: any tables to the reader of your model
  • 00:53: so that it's much easier to understand
  • 00:56: what those numbers mean
  • 00:58: and you may want to add additional
  • 01:00: styles
  • 01:02: especially something like conditional
  • 01:04: formatting where
  • 01:05: you highlight certain values based on
  • 01:07: some conditions
  • 01:09: to make it easier for the reader to
  • 01:12: understand the
  • 01:13: data in your table
  • 01:17: now most people probably already are
  • 01:21: very familiar with styling
  • 01:23: tables in excel you know you can change
  • 01:27: the number formatting you can change
  • 01:29: uh add borders you can change colors
  • 01:34: all these kind of things are just up
  • 01:36: there available in the ribbon
  • 01:38: with pandas uh of course you're using
  • 01:42: lines of code
  • 01:43: to do this formatting uh
  • 01:46: but it's a lot more flexible than the
  • 01:49: formatting
  • 01:50: in excel and the reason that
  • 01:53: it's more flexible is
  • 01:57: the way that it's built so
  • 02:00: when you see that table representation
  • 02:04: of the data the way that
  • 02:08: pandas the authors of pandas have made
  • 02:10: this happen
  • 02:11: is it actually produces
  • 02:15: html and css for
  • 02:18: this data to display this data frame
  • 02:22: and html and css for people that aren't
  • 02:25: familiar
  • 02:26: uh together make up the markup language
  • 02:30: which describes how any web page
  • 02:34: looks so anything that you've ever seen
  • 02:37: on any website
  • 02:38: you could potentially make your data
  • 02:40: frame look like that
  • 02:42: and basically anything at all because of
  • 02:44: how flexible
  • 02:45: html and css are
  • 02:48: and so you can go all the way to
  • 02:51: that complete control by directly
  • 02:54: manipulating the html and css
  • 02:57: of the representation of your data frame
  • 03:01: but that is definitely a little bit more
  • 03:02: advanced
  • 03:04: they've also added a lot of more
  • 03:06: convenient things that you can just
  • 03:08: easily do out of the box with simple
  • 03:09: commands
  • 03:11: that represent some of the most common
  • 03:12: things that you would want to do
  • 03:14: with your formatting so things like
  • 03:18: changing colors size positioning of text
  • 03:21: adding captions to your table
  • 03:22: conditional formatting
  • 03:24: and uh drawing a bar graph within the
  • 03:28: table
  • 03:29: these things are all uh kind of well
  • 03:32: explained and easily there out of the
  • 03:34: box
  • 03:34: but anything that you can possibly
  • 03:36: imagine
  • 03:37: you can do that with styling data frames
  • 03:42: so let's go ahead and take a look at the
  • 03:45: example
  • 03:46: notebook on what we can do so
  • 03:49: first let's talk about number formatting
  • 03:52: so
  • 03:52: we've already learned about number
  • 03:54: formatting with just plain numbers
  • 03:57: and python and we've been doing that
  • 03:59: using f strings
  • 04:01: so with the f strings uh we take
  • 04:04: the number and we put it in this
  • 04:08: quote quoted string with an f on the
  • 04:10: beginning
  • 04:12: and then inside that we do the curly
  • 04:14: braces
  • 04:16: and then within the curly braces we put
  • 04:18: our variable
  • 04:20: on the left and then a colon and then we
  • 04:22: put the format
  • 04:23: code that we want to use to describe the
  • 04:27: data
  • 04:31: and so we're going to take that same
  • 04:33: concept over to pandas
  • 04:36: as well now before we actually
  • 04:39: style our first data frame i do want to
  • 04:41: mention kind of a philosophical
  • 04:44: difference between the formatting
  • 04:48: in excel and the formatting of pandas
  • 04:51: data frames and python
  • 04:53: so in excel you have your data in a
  • 04:55: table and then you go and you format
  • 04:57: that table and now
  • 04:59: the original data is formatted
  • 05:03: but in python with pandas
  • 05:06: we have a separation of the data and the
  • 05:10: display
  • 05:11: of the data instead of directly styling
  • 05:14: the original data
  • 05:16: we actually create a separate object a
  • 05:20: style
  • 05:21: object which
  • 05:24: is the formatted representation of the
  • 05:26: data
  • 05:27: but that does not carry those changes
  • 05:30: back to
  • 05:31: the original data the original data is
  • 05:33: still unformatted
  • 05:35: so we only apply this formatting just as
  • 05:38: we're about to display
  • 05:40: this information to the reader of the
  • 05:42: model
  • 05:44: and that's the same way that things are
  • 05:47: working here with the f strings
  • 05:49: and formatting numbers in python
  • 05:52: you know we did this but my num
  • 05:55: is still the same exact number it
  • 05:58: doesn't have this
  • 05:59: formatting on it it was just as soon as
  • 06:01: we were ready to display that to the
  • 06:03: reader of the model
  • 06:05: then we added the formatting just in
  • 06:07: that spot it did not carry back
  • 06:09: to the original data and it's going to
  • 06:11: be the same exact concept
  • 06:13: with our pandas styling
  • 06:17: so now let's look at we were able to
  • 06:20: apply you know two decimal place
  • 06:23: formatting here if we have more decimals
  • 06:26: then it cuts it off at two because of
  • 06:28: this formatting
  • 06:31: let's look at applying that to the data
  • 06:34: frame as well
  • 06:37: so most of the things that we're going
  • 06:39: to do
  • 06:41: with styling data frames it's going to
  • 06:42: be dot style
  • 06:45: on the data frame and a lot of them are
  • 06:48: through this
  • 06:48: dot format then uh under the dot style
  • 06:52: so df.style.format and
  • 06:56: then for the number formatting we pass
  • 06:59: it to this format and we give it a
  • 07:00: dictionary and the dictionary should
  • 07:04: have
  • 07:04: the keys of the dictionary should be the
  • 07:06: names of the columns
  • 07:08: that we want to format so here we're
  • 07:09: formatting the reservation
  • 07:11: price column and the values
  • 07:14: should be this format
  • 07:18: you know very similar to what we did
  • 07:20: here a format string
  • 07:22: for how we want to format the data now
  • 07:24: the one difference that you'll see
  • 07:26: in what we had here versus what we have
  • 07:28: here
  • 07:29: is this one does not one does not
  • 07:31: include the f
  • 07:32: and two it does not include the variable
  • 07:34: itself
  • 07:36: so after your braces you then start with
  • 07:38: the colon
  • 07:39: as the first thing it knows that
  • 07:42: basically the variable is going to get
  • 07:44: inputted in here
  • 07:46: and then colon and then the same exact
  • 07:48: format code
  • 07:50: that you use for the f strings
  • 07:53: so when we run this then we see
  • 07:57: uh that it has put that dollar sign and
  • 07:59: the two decimal place
  • 08:01: formatting on there just to show the
  • 08:04: decimals if we put one
  • 08:05: then it would only be a single decimal
  • 08:07: place showing up there
  • 08:11: now we created this s variable
  • 08:15: as df.style.format the result of that
  • 08:18: and when we look at s it looks like a
  • 08:20: data frame
  • 08:21: right so you might include that you know
  • 08:24: now this
  • 08:25: s is our data we just have a formatted
  • 08:28: version of the data like we do in excel
  • 08:31: but actually if we check the type of
  • 08:33: this it's a
  • 08:34: styler object no longer a data frame
  • 08:39: and all of a sudden we can't do
  • 08:43: some of the things that we did before
  • 08:45: with it because it's not a data frame
  • 08:47: anymore
  • 08:47: it's a styler and so that's why i say
  • 08:50: you always just want to
  • 08:52: apply this formatting immediately before
  • 08:54: you're going to display it
  • 08:56: and not try to format your existing
  • 08:59: data frame because that's just not the
  • 09:02: way this works
  • 09:04: and we'll look at a pattern of what i
  • 09:07: think is a nice
  • 09:08: way to apply this formatting in your
  • 09:10: models
  • 09:12: but let's look through the other
  • 09:15: formatting methods first
  • 09:18: so we can apply that say percentage
  • 09:21: formatting
  • 09:22: to percentage active here uh zero
  • 09:25: decimal place
  • 09:26: percent formatting um
  • 09:29: and you can continue to chain additional
  • 09:32: formatting on a styler
  • 09:34: object so you can style stylers you can
  • 09:37: style data frames
  • 09:39: but you can only do math and other
  • 09:41: operations with
  • 09:42: data frames um
  • 09:46: so here we're adding additional
  • 09:48: formatting of we also want to format the
  • 09:50: percentage active volume
  • 09:51: column with the percentages so now we
  • 09:53: have both of these two columns formatted
  • 09:57: um but you could have also just directly
  • 10:00: done it from the data frame
  • 10:01: in the first place by giving it both of
  • 10:03: these number formats
  • 10:04: at the same time just passing both those
  • 10:08: in the dictionary
  • 10:11: so that's number formatting now let's
  • 10:13: look at cell formatting
  • 10:17: so cell formatting we can do with again
  • 10:21: pretty much everything is on dot style
  • 10:23: and then
  • 10:25: just as we looked at how we can do apply
  • 10:27: map to apply a function
  • 10:29: to every cell of the data frame
  • 10:32: we can similarly use dot style dot apply
  • 10:35: map
  • 10:36: to apply formatting to each cell
  • 10:39: of the data frame of the styler object
  • 10:44: um so
  • 10:47: we can see here what this is doing we
  • 10:50: can already see the result of this this
  • 10:51: made all the cells values blue
  • 10:55: and so what we did here was we created
  • 10:59: a function that returns
  • 11:02: the style that we want for each cell
  • 11:07: so this is the general structure of how
  • 11:08: to format cells in pandas you write
  • 11:11: functions which return the styles you
  • 11:14: want to apply
  • 11:16: and then you use apply map to apply
  • 11:18: those styles
  • 11:20: so here we're setting the color to blue
  • 11:23: for
  • 11:24: each one of these cells
  • 11:28: and this color blue this is actually
  • 11:31: uh css as i mentioned uh
  • 11:35: html and css is all what's going on in
  • 11:37: the background here
  • 11:39: so uh you can take a look at um
  • 11:43: you know here's a color picker where you
  • 11:45: can
  • 11:46: look at different colors and you can
  • 11:49: grab
  • 11:49: the value that you want for
  • 11:53: your formatting and put that in there
  • 11:57: whatever color that you want
  • 12:00: or there are just you know predefined
  • 12:04: names
  • 12:05: for a lot of common colors as well and
  • 12:08: you can just google about
  • 12:10: css colors to learn more about all this
  • 12:17: um so then color
  • 12:20: is about the text color and background
  • 12:23: color
  • 12:24: is about the background color of the
  • 12:26: cells
  • 12:27: so similarly here we've created a
  • 12:29: function which returns this background
  • 12:30: color
  • 12:32: um of light green and we're applying
  • 12:35: that to all the cells and so we see them
  • 12:37: all highlighted in light green
  • 12:40: text align will let you control the
  • 12:43: alignment
  • 12:44: of the text within the cells so here
  • 12:46: we're now centering
  • 12:48: each of these values
  • 12:52: and you can do multiple at once
  • 12:55: by just putting semicolons uh
  • 12:59: between each of these definitions so
  • 13:02: you should always be returning a single
  • 13:04: string but
  • 13:07: each style is going to be the name of
  • 13:10: what you want
  • 13:10: to affect and then a colon
  • 13:14: and then the style you want to apply
  • 13:18: and then for any additional ones you put
  • 13:20: a semicolon and then repeat the same
  • 13:22: kind of thing again
  • 13:24: so here text color to white uh
  • 13:27: background color to black
  • 13:28: alignment to center and we see all of
  • 13:31: that happening here at once
  • 13:35: so that leads us into thinking about
  • 13:37: conditional formatting where conditional
  • 13:39: formatting is
  • 13:40: we want to apply formatting to some
  • 13:43: subset
  • 13:44: of the cells whatever cells meet
  • 13:47: some certain condition that we want to
  • 13:49: apply
  • 13:52: so here let's think about
  • 13:57: you know this is a very short simple
  • 13:59: example we only have three people it's
  • 14:00: easy to just look through
  • 14:02: the values for three people but you
  • 14:03: could imagine you could have
  • 14:05: hundreds thousands of people in this
  • 14:07: table
  • 14:09: and what we want to try and do here is
  • 14:11: highlight the people who are not very
  • 14:12: active
  • 14:14: and so if their activity percentage is
  • 14:16: less than 50
  • 14:18: then we want to highlight that
  • 14:21: cell so the way that we can do that
  • 14:25: is just adding that conditional logic
  • 14:28: into our function
  • 14:29: which returns the style and knowing that
  • 14:32: if we return an empty string
  • 14:34: then it's not going to apply any style
  • 14:37: but if we return that string with the
  • 14:39: value
  • 14:40: then it is going to apply that style
  • 14:43: so here the function does two things
  • 14:47: so first um it checks
  • 14:50: if the value we're getting is a string
  • 14:52: so this is
  • 14:53: something we haven't seen before is
  • 14:55: instance is a way of checking the type
  • 14:58: of some object so here we're checking
  • 15:02: is the value that we're getting a string
  • 15:04: if so we're not going to format it
  • 15:06: and the reason we've added this let me
  • 15:08: go ahead and comment this out
  • 15:11: so that now this is all that is
  • 15:13: executing in the function
  • 15:15: we run that and we see we get an error
  • 15:17: right away
  • 15:18: type error because less than
  • 15:22: is not supported between a string and
  • 15:25: float and that's because uh
  • 15:28: we're checking if value is less than 0.5
  • 15:31: and we have jim
  • 15:32: joe and mary as values in there and so
  • 15:35: now it's trying to say
  • 15:36: uh you know like like joe
  • 15:40: is joe less than point five we see that
  • 15:43: same exact
  • 15:44: error so that's that's the problem
  • 15:45: that's going on here we don't want to
  • 15:47: compare the strings we never want to
  • 15:49: highlight the strings we just want to
  • 15:50: ignore them
  • 15:52: we do want to take the numeric values
  • 15:54: and do that comparison
  • 15:56: so that's all that this first part means
  • 15:59: is just basically skip
  • 16:01: the string values just return no style
  • 16:04: for the string values then we come to we
  • 16:07: know we've got a number now
  • 16:09: so now we're comparing is that number
  • 16:11: less than 0.5
  • 16:12: if so we're gonna make the background
  • 16:14: pink
  • 16:16: um otherwise the value is greater than
  • 16:19: 0.5 and so we're going to
  • 16:22: just return no style and so that's why
  • 16:25: we see
  • 16:26: after we apply this style to the data
  • 16:28: frame
  • 16:29: that only this one jim
  • 16:32: jim value is highlighted because jim is
  • 16:35: the only
  • 16:36: inactive individual in the data frame
  • 16:41: um and you know say you're trying to
  • 16:45: develop this function
  • 16:46: and things are not working out the way
  • 16:48: that you expected
  • 16:50: one shortcut or pattern you can use to
  • 16:53: be able to test out
  • 16:55: you know what what am i actually getting
  • 16:57: in each of these cells as far as the
  • 16:58: styling
  • 17:00: if instead of doing df.style.applymap
  • 17:03: you just do df.applymap remember we
  • 17:06: learned that's taking each cell and
  • 17:07: applying a function to it
  • 17:09: then you can see what styles are going
  • 17:11: to get applied in each cell
  • 17:12: so here no styles to each of these cells
  • 17:15: and here this one is getting
  • 17:17: background color pink so a nice way if
  • 17:20: your styles just are not coming out the
  • 17:22: way you expected
  • 17:23: see what am i actually getting in each
  • 17:25: of these cells
  • 17:30: and that's formatting individual cells
  • 17:32: we can also do formatting with the
  • 17:33: entire table
  • 17:34: at once so you can hide the index the
  • 17:38: index
  • 17:39: uh is this zero one two thing that goes
  • 17:41: down the side
  • 17:42: um so you don't wanna see that then you
  • 17:45: can just hide that
  • 17:47: you can hide individual columns if you
  • 17:50: want to exclude those just for the
  • 17:51: display purposes
  • 17:54: you can add a caption or title to the
  • 17:57: table
  • 17:58: um and there's a whole lot more that you
  • 18:02: can do
  • 18:02: this gets into uh how you can
  • 18:05: basically um do whatever you want
  • 18:09: with data frames and this example comes
  • 18:12: right out of the pandas
  • 18:14: documentation actually here is adding
  • 18:17: a hover style to the data frame so it
  • 18:20: does something different
  • 18:22: when you hover over it
  • 18:25: so we're not going to go into the
  • 18:27: details here this
  • 18:28: is really getting outside the scope of
  • 18:30: the course but i'm just showing this to
  • 18:32: kind of highlight you can do whatever
  • 18:34: you want
  • 18:35: with these uh styling of the data frames
  • 18:39: really
  • 18:39: the limit is only your imagination and
  • 18:42: willingness to
  • 18:43: figure out the specific code to make
  • 18:45: that happen
  • 18:51: one other nice uh thing that we can do
  • 18:53: very easily
  • 18:54: with these styling of data frames
  • 18:59: is it has this style.bar
  • 19:02: method which draws bar
  • 19:06: bar plots within the individual cells
  • 19:10: of the data frame and so if
  • 19:13: obviously we just have three rows it was
  • 19:15: already fairly easy to see but if you
  • 19:17: have a lot more this is
  • 19:18: getting a lot more useful uh from this
  • 19:21: we can easily see what are the smallest
  • 19:23: values in each of the columns what the
  • 19:25: largest values in each of the columns
  • 19:27: based off of those bars and you can
  • 19:31: change the color of those
  • 19:32: by passing a color argument and there
  • 19:35: are three different ways that we can
  • 19:37: align
  • 19:38: those bars as well
  • 19:41: so this is the
  • 19:45: default alignment of it
  • 19:49: which is the left alignment
  • 19:52: and so that's saying that the lowest
  • 19:53: value is not going to show up
  • 19:55: the highest value is going to be full
  • 19:57: and the rest are
  • 19:58: kind of spread in between
  • 20:01: or you can align from zero meaning that
  • 20:05: empty bar is representing 0
  • 20:09: and a full bar is
  • 20:13: again the max
  • 20:17: and we also have this mid alignment
  • 20:21: which puts zero
  • 20:24: at the middle negative values are going
  • 20:27: to be below that
  • 20:28: and positive values are going to be
  • 20:30: above that and we can also pass a subset
  • 20:33: to give it
  • 20:34: only certain columns that we want to
  • 20:36: apply this to instead of applying it to
  • 20:38: every numeric column
  • 20:42: so i mentioned i would show you a
  • 20:43: pattern on the best way
  • 20:46: to organize yourself with this styling
  • 20:49: because again you don't apply styling to
  • 20:52: the original data frame
  • 20:54: you're creating a new object a styler
  • 20:56: object which has the formatting which is
  • 20:58: separate
  • 20:59: from your original data so
  • 21:03: that df there does not have any of this
  • 21:06: styling that we just
  • 21:07: played around with in all these prior
  • 21:08: sections the original data is still as
  • 21:11: it was
  • 21:14: um and so the way that we can work with
  • 21:17: this in a nice way
  • 21:18: is by creating styler functions
  • 21:21: so you create a function which takes the
  • 21:24: data frame
  • 21:25: as the input and returns the styler
  • 21:28: object and that way all your
  • 21:32: formatting code is totally separate from
  • 21:35: all your data logic
  • 21:37: and you can potentially style multiple
  • 21:39: different data frames
  • 21:40: in the same way by just passing them to
  • 21:43: the same
  • 21:44: styler function this also shows you how
  • 21:47: you can
  • 21:48: chain a bunch of different uh formatting
  • 21:51: commands
  • 21:52: in one go uh if you use the parentheses
  • 21:55: that you can break this onto multiple
  • 21:57: lines then it looks pretty nice
  • 21:59: as well that first we're going to format
  • 22:01: the numbers
  • 22:03: with for the two columns and the ways
  • 22:04: that we talked about and then i want to
  • 22:06: highlight the inactive ones
  • 22:08: and i want to center the values and i
  • 22:11: want to hide the index
  • 22:12: and i want to give it a title of
  • 22:13: personal info and i want to draw the
  • 22:16: bars
  • 22:18: for the reservation price column so
  • 22:21: defining that function
  • 22:23: then we can call that on any data frame
  • 22:26: that we want so now we see the result
  • 22:29: of all those changes all together
  • 22:32: caption
  • 22:33: the centering the bar the highlighting
  • 22:35: the inactive
  • 22:36: the number formatting and now it all
  • 22:39: happens from
  • 22:40: one single function and because we have
  • 22:44: that as a function we can apply it to
  • 22:45: any data frame
  • 22:46: that we want um and so you know here
  • 22:50: we're
  • 22:50: we're making a new data frame which is
  • 22:52: just pulling out the reservation price
  • 22:55: and percentage active columns and
  • 22:57: multiplying them by 0.8
  • 22:59: we can take that same one and we can
  • 23:01: pass it to the same
  • 23:02: stylor function and we see that style
  • 23:05: now in the same way
  • 23:10: then there are also a few other
  • 23:12: shortcuts that pandas has
  • 23:15: added that are nice to know about you
  • 23:18: can
  • 23:18: easily highlight the maximum value in a
  • 23:21: column
  • 23:22: or the minimum value and you can
  • 23:26: pass similar arguments like we did for
  • 23:27: the bars color
  • 23:29: a subset
  • 23:32: same thing on the minimum you can pass
  • 23:34: those same kind of values
  • 23:36: background gradient is a really nice one
  • 23:38: for
  • 23:40: just giving a color map
  • 23:43: to the cells highlighting them based on
  • 23:46: their values
  • 23:48: and you can pass what specific type
  • 23:52: of color map that you want i mean if you
  • 23:55: want to figure out what
  • 23:56: are the possible
  • 24:00: values that i can use in that you can
  • 24:02: google matplotlib color
  • 24:04: maps again remember that pandas uses
  • 24:06: matplotlib under the hood for
  • 24:09: all these things and you can see
  • 24:12: this lists out all the pop all the
  • 24:14: possible
  • 24:16: color maps that you can use but the one
  • 24:19: that we'll
  • 24:20: focus on here and of course most often
  • 24:22: is
  • 24:23: this uh red yellow green color map
  • 24:26: um and you can do underscore r to
  • 24:30: reverse any color map
  • 24:32: so uh red yellow green
  • 24:35: is going to be the highest values in
  • 24:37: green the lowest values in red and a
  • 24:39: gradient in between
  • 24:41: and with the underscore r then it's the
  • 24:43: opposite highest values in red lowest
  • 24:46: values in green
  • 24:47: and a gradient in between
  • 24:50: and we'll see this definitely is a
  • 24:52: really nice way to be able to get
  • 24:55: a nice quick way to see where the
  • 24:58: largest and smallest values in a table
  • 25:00: are so that's an overview
  • 25:03: on pandas styling we're going to come
  • 25:06: back next time
  • 25:07: to learn about graphing with pandas
  • 25:10: and python so thanks for listening and
  • 25:13: see you next time

Introduction to Graphs in Python with Pandas


Notes

  • All the main graph types that you would expect are available in Pandas

  • See the official Pandas visualization guide on how to adjust any plots to your liking, but the defaults are already pretty good

Transcript

  • 00:02: hey everyone
  • 00:03: nick duraburtis here teaching you
  • 00:05: financial modeling today we're going to
  • 00:07: be talking about
  • 00:08: creating graphs in python using pandas
  • 00:12: this is part of our lecture series on
  • 00:14: understanding complex
  • 00:15: results through visualization
  • 00:18: so we learned in the last couple videos
  • 00:22: how we can use pandas in python as
  • 00:25: a way of putting our data into a tabular
  • 00:28: format and
  • 00:30: the other main usage that we're going to
  • 00:32: have for pandas is then to produce
  • 00:35: graphs from those data so
  • 00:39: now that we already have a data frame to
  • 00:42: work with
  • 00:43: it's quite easy to produce these plots
  • 00:46: so here's a quick example
  • 00:50: of doing a plot in pandas
  • 00:54: and really it comes down to one line of
  • 00:57: code if we already have our data frame
  • 00:59: variable
  • 01:00: then you do dot plot and then dot
  • 01:03: whatever type of plot that you want to
  • 01:05: do and you tell it what are the x and y
  • 01:08: axes uh giving it the column names
  • 01:12: of the data in your data frame
  • 01:15: and that's that's it then it will
  • 01:17: produce the plot for you
  • 01:19: um so quite straightforward the other
  • 01:22: part that you see here is this
  • 01:24: matplotlib
  • 01:25: inline thing don't worry too much about
  • 01:28: that
  • 01:29: that's just something for jupiter
  • 01:32: notebooks
  • 01:32: that allows it to display these plots
  • 01:36: appropriately in all cases
  • 01:37: so you just add this thing at the top of
  • 01:39: your notebook you can put it with your
  • 01:41: imports
  • 01:41: just do it one time just have it there
  • 01:43: at the top and then
  • 01:45: everything going forward you just do uh
  • 01:48: dot plot
  • 01:48: dot whatever to get your actual plot
  • 01:52: and for any of the main chart types that
  • 01:55: you expect
  • 01:56: you have that in pandas um
  • 02:00: so for all of the basic
  • 02:03: plot types um you know we pretty much
  • 02:07: have
  • 02:08: a python version which is analogous to
  • 02:10: the excel version
  • 02:12: um so here you know seeing a
  • 02:15: line plot and uh created from
  • 02:19: pandas versus align plot in excel
  • 02:24: we have these column or bar charts
  • 02:27: uh you can do box and whisker or
  • 02:30: whatever really else
  • 02:32: kind of all the basic general plot types
  • 02:37: so let's go look at an example of this
  • 02:42: so coming over to the intro to graphics
  • 02:44: jupiter notebook
  • 02:46: so first thing we're going to do is
  • 02:49: import pandas
  • 02:50: and we're going to run this percent
  • 02:53: matplotlib and line thing
  • 02:54: again don't need to worry much about
  • 02:57: what that is just do it once at the top
  • 02:59: of your notebook
  • 03:03: before we can plot something with pandas
  • 03:05: we have to have a data frame which has
  • 03:06: data in it
  • 03:07: so we'll go ahead and just create a data
  • 03:10: frame here so
  • 03:11: going with the style of creating an
  • 03:13: empty data frame and then assigning
  • 03:15: columns
  • 03:16: so here assigning uh some values
  • 03:19: this could be like a stock price over
  • 03:21: time
  • 03:22: uh and then get those time values
  • 03:25: um so now we've got a data frame here
  • 03:29: and here i'm actually doing what's
  • 03:31: called a list comprehension to
  • 03:33: produce those values that
  • 03:36: is basically creating this list we will
  • 03:39: cover this more later in the course you
  • 03:42: don't have to worry about it for now
  • 03:45: just know that all we're doing is
  • 03:46: assigning lists to create these columns
  • 03:52: so now that we have this data in this
  • 03:54: data frame
  • 03:55: we can plot it so you can just call
  • 03:59: justplot by itself without telling it
  • 04:02: the type of plot
  • 04:03: and it's going to try to do the best job
  • 04:06: it can
  • 04:06: in producing a plot so this is kind of
  • 04:09: analogous to
  • 04:10: the recommended charts in excel it's
  • 04:12: going to you know try to give you a
  • 04:14: recommended plot
  • 04:15: here in python um
  • 04:19: but we can see this is probably not what
  • 04:21: we wanted
  • 04:22: uh we want to see these values over time
  • 04:25: and right now it's plotting both the
  • 04:27: values and time
  • 04:29: um and it's going against this index
  • 04:33: this
  • 04:33: zero to 10 thing here which is why the
  • 04:36: first value we see
  • 04:37: is zero um so what we can do is we can
  • 04:41: tell it
  • 04:42: the y and x values and it can do
  • 04:45: a better job now knowing more about what
  • 04:47: we want
  • 04:49: so we know we want to have time on the
  • 04:51: x-axis and we want to have the values on
  • 04:53: the y-axis
  • 04:54: so we just pass that to the plot command
  • 04:58: values on the y and we want
  • 05:01: the t on the x the names the same names
  • 05:04: of the columns in this data frame
  • 05:08: and now this looks more like what we
  • 05:10: might have expected
  • 05:12: we have just the values plotted it's
  • 05:15: over time
  • 05:16: we already got an axis label for this
  • 05:18: automatically as well
  • 05:20: and we can see this axis represents
  • 05:24: uh values and now the axis has been fit
  • 05:27: well to the range
  • 05:28: of the values as well rather than
  • 05:32: you know having to try and fit both of
  • 05:34: these completely different
  • 05:35: uh scales on the same graph
  • 05:40: um and so that we just did with plot we
  • 05:44: said
  • 05:45: try and guess add the plot that we want
  • 05:48: um but you can also explicitly tell it
  • 05:51: what type of plot that you want
  • 05:55: and to figure out what type of plots
  • 05:58: are available you can
  • 06:02: after df.plot then do another dot and
  • 06:05: hit
  • 06:06: tab and you'll see the different
  • 06:08: possibilities come up
  • 06:10: so we can do all these different kinds
  • 06:13: of plots
  • 06:15: so these represent most of what you
  • 06:17: would probably expect to see
  • 06:19: as far as plot options
  • 06:22: and here's just them printed out in the
  • 06:24: notebook you don't have to worry about
  • 06:26: the code that does that that was just so
  • 06:28: i could get them printed here
  • 06:31: so let's look at a few examples so
  • 06:34: we can do a dot plot dot area again
  • 06:37: passing the same line x
  • 06:39: and that's the same kind of plot but it
  • 06:42: puts the full
  • 06:43: you know fills in the area under the
  • 06:44: line
  • 06:46: we got bar graphs dot bar
  • 06:50: uh again looks like we would expect
  • 06:53: shows the values over time
  • 06:57: and you have bar h to do a horizontal
  • 07:00: bar
  • 07:00: plot so same same thing as the paragraph
  • 07:03: just now
  • 07:04: rotated we can do a box and whisker plot
  • 07:07: with dot box
  • 07:11: the box box and whisker plots and the
  • 07:13: density and histograms those are all
  • 07:15: good
  • 07:16: for giving summaries of the distribution
  • 07:19: of the data
  • 07:22: so now we can see
  • 07:26: uh you know the full range of the data
  • 07:27: the interquartile range
  • 07:30: plotted on here so that gives a nice
  • 07:33: summary
  • 07:36: we can do you know both density and
  • 07:39: histogram are kind of getting out the
  • 07:41: same concept
  • 07:42: we want to see the distribution of
  • 07:45: the data across all the different values
  • 07:49: how frequently are we getting different
  • 07:50: values
  • 07:51: so the density is just a smooth
  • 07:55: version of the histogram so the
  • 07:58: histogram
  • 07:59: just puts things into buckets says you
  • 08:02: know this
  • 08:03: uh bucket is around 92.5 and then
  • 08:06: from like 93 to 95 is this bucket
  • 08:10: uh how many values fall within that
  • 08:12: range well three are within like the 93
  • 08:14: to 95 range
  • 08:16: and so you can see how many values occur
  • 08:19: in each of these
  • 08:20: ranges so the density plot is kind of
  • 08:23: the same thing
  • 08:24: we can see is something similar here
  • 08:27: only it's like smoothed out
  • 08:29: across uh all the different values
  • 08:33: rather than splitting it into buckets
  • 08:38: and pie chart doesn't really make sense
  • 08:41: for these data
  • 08:42: but you can certainly do that um
  • 08:45: and if you had data where it was more
  • 08:48: appropriate
  • 08:49: and then it's a good option uh such as
  • 08:51: frequencies or percentages for different
  • 08:53: things
  • 08:56: and we see all the values laid out in a
  • 08:59: pie chart here
  • 09:02: scatter plots the individual points
  • 09:06: and so this
  • 09:10: kind of wraps up all the different basic
  • 09:12: plot types that you would probably be
  • 09:14: using
  • 09:15: in this class and now of course there's
  • 09:17: way more that you can do
  • 09:19: by going to the base matplotlib that
  • 09:22: powers
  • 09:23: pandas plots but that's definitely
  • 09:26: getting outside the scope of this course
  • 09:28: we just want to be able to quickly get
  • 09:31: a good representation of our data
  • 09:34: and pandas gets us all the way there
  • 09:38: and if you want to customize your plots
  • 09:40: in any way
  • 09:41: just google about map matplotlib and
  • 09:44: matplotlib customization there's a whole
  • 09:46: lot there
  • 09:47: and you can then take these panties
  • 09:51: plots
  • 09:51: and add on that styling afterwards and
  • 09:54: customization afterwards
  • 09:58: so that's an overview on doing uh
  • 10:01: graphing with pandas and python
  • 10:05: next time we're going to come back and
  • 10:07: apply this
  • 10:08: uh this graphing as well as table
  • 10:11: visualization
  • 10:13: to the dynamic salary retirement model
  • 10:15: in python
  • 10:17: so thanks for listening and see you next
  • 10:22: time

Visualization in Python Example


Notes

  • If you have structured your model well, it should be easy to add visualization at the various stages of your model

  • Visualizations are especially helpful in Python as you don’t automatically see the tabular representation of the data like you do in Excel

Transcript

  • 00:02: hey everyone
  • 00:03: nick dear vertis here teaching you
  • 00:04: financial modeling and today we're going
  • 00:07: to be going over an example of how to do
  • 00:09: visualization in python as part of our
  • 00:12: lecture series on
  • 00:13: understanding complex results through
  • 00:16: visualization
  • 00:17: so we're going to add visualization
  • 00:20: to the dynamic salary retirement model
  • 00:23: which we've already built out in the
  • 00:25: course
  • 00:26: so i'm going to pull that up here
  • 00:29: and this is the same one which is
  • 00:30: available as a completed example
  • 00:32: on the course website so
  • 00:36: um we already have the retirement model
  • 00:39: and i'm just going to restart the kernel
  • 00:41: and
  • 00:41: run all cells so that we have everything
  • 00:44: defined
  • 00:47: and so we already have where we got to
  • 00:50: ultimately the result of
  • 00:52: it's going to take 28 years to retire
  • 00:54: and we just printed out strings
  • 00:56: of the wealth over time
  • 00:59: so now we want to do two things
  • 01:03: to visualize our results in a better way
  • 01:07: one is it would be good to have a table
  • 01:10: of
  • 01:11: these salaries and wells over time so
  • 01:13: that you can see both the salary and
  • 01:15: wealth together
  • 01:16: and each of those over time in a nice
  • 01:17: format
  • 01:19: and then the other uh thing we want to
  • 01:21: do to visualize
  • 01:22: is graph the salary and graph the wealth
  • 01:26: over time so
  • 01:29: we can create a new section of the model
  • 01:33: let's call this results summary
  • 01:38: and you would want to put some
  • 01:40: description of that
  • 01:41: here you can take a look at the
  • 01:43: completed example
  • 01:45: on the course website for having this
  • 01:48: thing fully polished with all the
  • 01:50: descriptions doc strings for functions
  • 01:53: etc
  • 01:54: we're just going to kind of build it out
  • 01:55: quickly here for sake of time
  • 01:59: so first thing we want to do
  • 02:03: is we want to create a data frame a
  • 02:05: pandas data frame
  • 02:06: which has our results so coming back to
  • 02:09: the top
  • 02:10: uh you'll notice that we don't yet have
  • 02:12: an import for pandas so we're going to
  • 02:14: import pandas
  • 02:16: as pd and then run this cell so that we
  • 02:19: can actually work
  • 02:20: with pandas the other thing that you'll
  • 02:22: notice because this is
  • 02:24: kind of the polished completed model is
  • 02:26: we don't have
  • 02:27: the uh setting model data to data so i'm
  • 02:30: going to add that
  • 02:32: back here so that we can develop our
  • 02:34: functions
  • 02:35: easily outside the function in the cell
  • 02:38: and then wrap it up into a function
  • 02:39: which accepts data
  • 02:41: and we're not going to accidentally use
  • 02:43: this global
  • 02:44: model data so
  • 02:47: coming back down to the bottom now i
  • 02:50: want to write out the logic which is
  • 02:52: going to
  • 02:52: put the uh salaries and wells into a
  • 02:56: data frame
  • 02:58: so to get there i'm going to start from
  • 03:01: the logic that we had before
  • 03:05: so i'm going to copy and paste that down
  • 03:07: to here
  • 03:09: highlight that all and hold shift and
  • 03:10: press tab to
  • 03:12: undo that and then and then
  • 03:17: you know if we take out the return part
  • 03:20: uh
  • 03:20: we run this it does the same thing as
  • 03:22: that function right
  • 03:24: but we don't care about the print
  • 03:25: display so i'm going to remove all the
  • 03:27: prints here
  • 03:29: i'm just hitting ctrl x that would be
  • 03:30: command x on a map
  • 03:32: mac to just remove a line
  • 03:35: and so now uh we
  • 03:38: have kind of the base logic without it
  • 03:40: displaying anything
  • 03:42: so what we want to do is we want to
  • 03:46: store
  • 03:47: the sell the salary and wealth results
  • 03:51: in each year so that we can put them all
  • 03:53: into a data frame
  • 03:55: and we don't actually have the salary
  • 03:56: separately in here within the wealthy
  • 03:58: year function it
  • 04:00: is determining a salary um so we're
  • 04:02: going to want to also
  • 04:03: separately calculate the salary so
  • 04:07: that was the salary year function that
  • 04:09: got us there
  • 04:10: and to that we passed the data and also
  • 04:13: the year
  • 04:15: so now we have uh calculating both the
  • 04:18: salary and the wealth
  • 04:19: in any given year
  • 04:23: so now we've got to store that data so
  • 04:25: we can create a data frame
  • 04:27: and any time where you have you know
  • 04:30: some kind of loop
  • 04:32: and you're ultimately going to want to
  • 04:34: put the results of that loop into a
  • 04:35: table
  • 04:36: and you have multiple different columns
  • 04:38: that you're going to want to have in
  • 04:39: that table that are all coming out of
  • 04:42: this same
  • 04:42: loop then i recommend using the
  • 04:44: structure of
  • 04:46: creating a list of tuples where
  • 04:50: you can then create the data frame all
  • 04:52: at once from that list of tuples
  • 04:54: you can certainly do the approach of
  • 04:57: creating the empty data frame and
  • 04:58: assigning columns
  • 05:00: but then you have to maintain separate
  • 05:01: lists for each one of the inputs which
  • 05:03: gets a little bit
  • 05:04: tedious um so first
  • 05:08: let's look at the recommended approach
  • 05:11: and then i'll quickly show
  • 05:13: the other approach just to show
  • 05:16: why it has drawbacks so
  • 05:20: my recommended approach is to create
  • 05:22: your
  • 05:24: list which is going to store all the
  • 05:26: tuples
  • 05:27: so you create that before the loop and
  • 05:30: then at the end of the loop
  • 05:31: we can append to that list and we're
  • 05:35: going to append
  • 05:36: a tuple which has the year the salary
  • 05:40: and the wealth all in it all three items
  • 05:42: together
  • 05:43: um and so then we can look at that
  • 05:48: df data tubes and we can see that gets
  • 05:51: us you know for each year we have the
  • 05:53: salary
  • 05:54: and the wealth so then we can
  • 05:57: create a data frame from that
  • 06:01: df is a pd.data frame from
  • 06:04: the list of tuples and we pass it the
  • 06:07: names of our columns
  • 06:09: so that would be year salary and
  • 06:12: wealth
  • 06:16: and then we can look at the data frame
  • 06:18: and then we see all this
  • 06:19: in the data frame format instead
  • 06:23: so definitely um i'm just going to go
  • 06:25: ahead and
  • 06:26: copy this cell so that we still have our
  • 06:30: you know good solution here and to show
  • 06:32: you the other way
  • 06:34: where we first create the empty data
  • 06:36: frame and then we want to assign the
  • 06:38: columns
  • 06:39: to do that you would have to have
  • 06:41: separate lists for
  • 06:43: each of your values so years salaries
  • 06:48: uh wealth and then you would do
  • 06:52: uh you know years dot append year
  • 06:55: you would do salaries
  • 06:59: uh dotted pen salary and you would do
  • 07:03: else without append wealth
  • 07:08: and then you would do df
  • 07:11: year equal years df
  • 07:14: wealth equals wealth
  • 07:18: and df salary equals
  • 07:21: salaries so this will produce the same
  • 07:23: exact
  • 07:24: data frame but this
  • 07:27: is definitely a little tedious to create
  • 07:30: individual lists for each one of those
  • 07:32: inputs
  • 07:33: and assign those individually when you
  • 07:35: can just
  • 07:36: do it all together here single list
  • 07:39: append the tuple
  • 07:40: just give it the column names and you're
  • 07:42: done and this
  • 07:44: will continue to scale for as many
  • 07:46: different things as you want to track at
  • 07:48: once in a single loop
  • 07:51: so now we have some logic which can
  • 07:54: produce
  • 07:55: our data frame let's wrap this up in a
  • 07:57: function
  • 07:58: so let's call this um get
  • 08:01: salaries wealth df
  • 08:05: and it takes the data and now we can
  • 08:08: just indent all this
  • 08:10: and add the return before the df
  • 08:15: and then when we call this um get
  • 08:18: salaries well cf
  • 08:19: on the model data then we can see we get
  • 08:22: the same
  • 08:23: data frame coming out of that and so now
  • 08:26: because we set that up with general data
  • 08:28: you can
  • 08:31: give it any values you want say your
  • 08:35: your high roller your starting salary
  • 08:37: starts
  • 08:38: at 100 000 now we can see
  • 08:42: the data frame of all those results over
  • 08:45: time
  • 08:45: based on whatever different
  • 08:48: values we want to give to it so that's
  • 08:51: why we make sure
  • 08:52: to
  • 08:56: structure everything in this way where
  • 08:57: it can take any arbitrary data
  • 08:59: and not always the original model data
  • 09:02: but anyway so now we can
  • 09:08: get a data frame which contains our
  • 09:11: salaries and wealth over time
  • 09:13: with a single command
  • 09:18: so then
  • 09:21: maybe say below this in the next cell
  • 09:23: we'll just define that here
  • 09:25: and then we can do uh df equals that
  • 09:28: on the model data
  • 09:32: so then again we have that data frame so
  • 09:35: the next thing that we might
  • 09:36: want to do or we definitely want to do
  • 09:38: if we're showing this off to the reader
  • 09:41: is to add some number formatting here uh
  • 09:44: we don't need we don't care about these
  • 09:46: decimals and
  • 09:48: we don't really want the scientific
  • 09:49: notation coming in here
  • 09:51: and these are all uh dollar amounts so
  • 09:53: it'd be nice to have the dollar sign
  • 09:55: on those values so
  • 09:59: um below this we can
  • 10:04: out what we're going to do so
  • 10:07: df.style.format
  • 10:08: then you pass the dictionary where you
  • 10:11: give it the column names
  • 10:13: and then you as the values
  • 10:16: is how you want to format that column
  • 10:19: so i want to put a dollar sign in front
  • 10:21: and i want to have zero decimal places
  • 10:23: and i want to have commas
  • 10:25: and i want that to be a fixed zero
  • 10:27: decimal places
  • 10:29: and then um want to do the same thing
  • 10:33: for the wealth the same format is going
  • 10:36: to be fine there
  • 10:38: as well
  • 10:41: so then when i do that then we can see a
  • 10:44: much nicer
  • 10:44: representation of those values over time
  • 10:51: another thing which we could possibly do
  • 10:54: here to help understand these results
  • 10:57: is to add the
  • 11:00: inline bar graph here um so we can do it
  • 11:03: we don't want it on the year column
  • 11:04: we just want it on the salary and wealth
  • 11:07: columns
  • 11:08: um and then we can see you know it makes
  • 11:11: it a lot easier to see those jumps for
  • 11:13: the promotions right we see this
  • 11:15: immediate break in
  • 11:18: the length of the bars representing that
  • 11:20: the salary or wealth
  • 11:22: jumped and
  • 11:27: we might also want to hide the index
  • 11:30: here
  • 11:30: the index is not useful here
  • 11:34: and now we've got a pretty nice display
  • 11:36: of these data
  • 11:39: so now what i recommend just always for
  • 11:42: our data frame
  • 11:43: styling is we want to wrap this in a
  • 11:45: styler function
  • 11:47: so let's
  • 11:50: style salaries wealth can be the name of
  • 11:53: the function
  • 11:54: and it takes a data frame and then i'm
  • 11:56: going to
  • 11:58: indent all of that and then i'm going to
  • 12:00: return the result of that
  • 12:04: so then that's defined
  • 12:07: so then we can
  • 12:11: uh below this then we're getting our
  • 12:13: data frame
  • 12:14: and then we want to style the
  • 12:18: data frame and that's what we're looking
  • 12:20: at
  • 12:21: so now two lines of code here off of any
  • 12:24: set of data that we want
  • 12:25: and we can get this nice styled
  • 12:28: uh representation of the salaries and
  • 12:31: wells over time
  • 12:35: so then the other thing that we want to
  • 12:37: do is
  • 12:38: plot our results so
  • 12:45: we have our original data frame still
  • 12:48: and this has the original values it's
  • 12:49: not the styled one you can't do any
  • 12:51: plotting on the styler object
  • 12:53: only on the data frame itself and we
  • 12:56: want to do a plot
  • 12:58: and let's do a line plot and we want our
  • 13:01: x to be the year and
  • 13:04: let's uh just plot the salary
  • 13:07: right now so there we can see the
  • 13:10: salaries
  • 13:11: over time now i could have
  • 13:15: plot both the salary and wealth on a
  • 13:17: single graph
  • 13:18: but we can see there's definitely a
  • 13:21: scale problem here
  • 13:22: where the wealth has a so much larger
  • 13:24: scale
  • 13:25: than the salaries and so it doesn't
  • 13:29: make a lot of sense to put these on the
  • 13:30: same graph so we can have the salaries
  • 13:33: and then we can have the wealth separate
  • 13:37: from that so
  • 13:40: we can plot the wealth here separately
  • 13:46: so line line plots are perfectly good
  • 13:49: for this you know you could
  • 13:50: do something like a bar or an area
  • 13:54: plot instead but i think the line
  • 13:57: is perfectly fine for this
  • 14:02: um and now we have
  • 14:05: a nice display of all of our
  • 14:09: information in the model
  • 14:12: so that's a an overview on how to add
  • 14:15: visualization
  • 14:16: to an existing python model using pandas
  • 14:21: and this uh concludes our lecture series
  • 14:24: on understanding complex results through
  • 14:27: visualization
  • 14:28: so thanks for listening and see you next
  • 14:33: time

Lab Exercises


Notes

  • Complete all the exercises in the Pandas and Visualization Labs Jupyter notebook

Transcript

  • 00:02: hey everyone
  • 00:03: nick dearbertus here teaching you
  • 00:05: financial modeling and today
  • 00:07: i'm going to be quickly going over the
  • 00:09: lab exercises
  • 00:10: for the section of the course on
  • 00:13: understanding complex results
  • 00:15: with visualization so the
  • 00:18: lab exercises here all center around the
  • 00:21: python side
  • 00:23: and they are represented here on slides
  • 00:26: in three different exercises
  • 00:30: ones related to getting started with
  • 00:32: pandas ones related to
  • 00:33: styling penis data frames and ones
  • 00:36: related to
  • 00:38: graphing with pandas now all of these
  • 00:41: are within the uh pandas and
  • 00:45: visualization
  • 00:46: labs that you can get from the course
  • 00:48: website
  • 00:49: or the link here in the slides
  • 00:53: and so that's all this jupiter notebook
  • 00:56: here
  • 00:56: pandas and visualization lab exercises
  • 00:59: so you'll want to complete all the
  • 01:01: exercises which
  • 01:03: are within here here we have four
  • 01:06: exercises
  • 01:07: on intro to pandas four on styling and 3
  • 01:11: on the graphing and they kind of go
  • 01:15: together in that
  • 01:16: you and a lot of them use the result of
  • 01:18: a prior one
  • 01:19: to continue forward so make sure to
  • 01:22: start from the beginning
  • 01:25: so that's a quick overview on the lab
  • 01:27: exercises
  • 01:28: for understanding complex results with
  • 01:31: visualization
  • 01:32: thanks for listening and see you next
  • 01:36: time