Data Pipelines

Covers how to collect, clean, and structure the data for your model in an automated way, using Python.

Creating a Data Loading Module for Capital IQ Debt Details


Notes

  • For the DCF model we have explored using the market value of individual debt instruments to estimate the market value of debt for the whole company

  • For Project 3, we looked at using Capital IQ as the source of those debt details

  • Here I examine how to automate the loading and cleaning of the debt details from Capital IQ, so they are ready to be worked with in the model

  • I also look at taking the resulting code and making a Python module, so it can be reused in future models

  • To clean up the data, I use a combination of Pandas methods, string methods, and regular expressions (regex)

Transcript

  • 00:03: hey everyone this is nick dear burtis
  • 00:05: teaching you financial modeling
  • 00:07: today we're going to be looking at how
  • 00:09: to load in capital iq
  • 00:11: details into python to work with them
  • 00:14: and also how to
  • 00:16: structure our code into a python module
  • 00:19: such that this code can be
  • 00:20: reused anytime that we want to load in
  • 00:22: debt details from capital iq
  • 00:24: regardless of the model this is part of
  • 00:27: our lecture segment
  • 00:28: on data pipelines so
  • 00:32: the data that we're looking at loading
  • 00:33: in here
  • 00:35: comes in the form of this excel
  • 00:37: spreadsheet
  • 00:38: and within that spreadsheet we have
  • 00:41: basically some
  • 00:42: header area which doesn't really matter
  • 00:45: to us
  • 00:45: we don't care about that what we want to
  • 00:47: get out of this
  • 00:49: is this table which in which is in here
  • 00:53: and there's a few things that we're
  • 00:54: going to have to do in order to clean up
  • 00:56: this data so we can
  • 00:58: actually work with it in our model
  • 01:00: before we get to that
  • 01:01: just want to note that there is
  • 01:03: typically a second table
  • 01:05: in here that represents older data so we
  • 01:08: don't want to use
  • 01:08: any of the data from that table only the
  • 01:11: top table
  • 01:13: and then some of the issues that we're
  • 01:15: going to have to deal with is the fact
  • 01:17: that
  • 01:19: these coupon rates actually have the
  • 01:22: percentage sign
  • 01:23: in the string it's not
  • 01:26: just formatted as a percent it actually
  • 01:28: has the percent sign in there
  • 01:31: uh so when we load it into python we
  • 01:33: will also see that percent sign in there
  • 01:38: and the um floating rates
  • 01:42: we have live or plus some rate
  • 01:46: so we're going to want to be able to
  • 01:47: take that and
  • 01:49: plug in a libor rate and get what the
  • 01:51: resulting rate is
  • 01:54: and then we also have these coupon rate
  • 01:56: ranges
  • 01:58: where for the purposes of our model
  • 02:00: we'll just take the midpoint of that
  • 02:02: range
  • 02:03: and use that as the coupon for the bond
  • 02:08: then one other thing is that in order to
  • 02:13: do value calculations with these bonds
  • 02:15: we're going to want to get
  • 02:17: uh that into a number of years or
  • 02:20: maturity
  • 02:21: and so we're going to be converting
  • 02:22: these dates into a number of years
  • 02:24: from today um and then
  • 02:27: some of the maturities only give a year
  • 02:30: and so we're going to assume those are
  • 02:32: at the end of the year
  • 02:33: and handle those that way
  • 02:37: so that's an overview of all the issues
  • 02:40: that we face
  • 02:41: in loading this in so now let's look at
  • 02:44: some python code
  • 02:46: to make that happen and um
  • 02:50: throughout doing this then we're going
  • 02:52: to try to make sure that it's standard
  • 02:55: and reusable such that even though this
  • 02:58: is for walmart we can then do the same
  • 03:00: thing
  • 03:01: or we can then use the same code to load
  • 03:03: any other company's
  • 03:05: details and have them all cleaned up and
  • 03:06: ready to use for our model
  • 03:11: so coming over to the jupiter notebook
  • 03:15: we've talked about two approaches in the
  • 03:17: course which will be useful
  • 03:19: to load this information in we can load
  • 03:21: it in using pandas
  • 03:23: or we can use excel wings
  • 03:28: i will generally go to pandas for any
  • 03:31: usage that
  • 03:32: doesn't require back and forth between
  • 03:34: python
  • 03:35: and excel wings for the most part unless
  • 03:39: i'm just trying to grab a single value
  • 03:41: out of excel then sometimes i'll still
  • 03:43: use excel wings
  • 03:45: but pandas generally is a lot more
  • 03:46: compatible
  • 03:48: you can run the code on linux or
  • 03:50: somewhere else that excel is not
  • 03:52: installed
  • 03:54: and still get the same result so i
  • 03:56: generally will prefer pandas and that's
  • 03:57: what i'll use here in this video
  • 04:02: so import pandas and then i'm going to
  • 04:07: load uh
  • 04:10: the file into a data frame
  • 04:14: the walmart that details i have it here
  • 04:17: in the same folder
  • 04:19: um and then we'll see what we get
  • 04:22: so we can see right away that this does
  • 04:25: not look good right we got
  • 04:27: a bunch of junk in here this is not at
  • 04:29: all the table that we wanted to get
  • 04:31: we've got all these on names we got a
  • 04:33: bunch of missing values in here
  • 04:35: um basically what we're seeing here
  • 04:38: corresponds to
  • 04:40: this top part where indeed there are no
  • 04:43: column headers and we've just got some
  • 04:44: junk up here we don't care about
  • 04:46: the actual table here starts in row
  • 04:50: 13. so
  • 04:53: one thing i can do to
  • 04:56: cut off this entire top area is there a
  • 05:00: lot of
  • 05:01: options for the read excel command as
  • 05:04: you can see
  • 05:05: um one such of those options is skip
  • 05:07: rows
  • 05:08: so we can tell it we want to skip
  • 05:11: 12 rows while loading it in and just
  • 05:13: start from the 13th row which is where
  • 05:15: we know the data starts
  • 05:18: so now we can see that we got the actual
  • 05:22: names of the columns in that table that
  • 05:24: looks good and it looks like we got the
  • 05:26: values of that table as well
  • 05:28: um now what we can also see is we have
  • 05:33: you know these last two rows are junk so
  • 05:35: let's say um
  • 05:37: you know 127 rows counting the zero
  • 05:41: and then let's hop over and look at how
  • 05:44: many rows
  • 05:46: are in this top table
  • 05:49: and we have 62 rows so clearly we're
  • 05:52: picking up
  • 05:53: this bottom table as well which we don't
  • 05:57: want
  • 05:58: [Music]
  • 06:00: so there's a number of ways that we can
  • 06:02: choose to
  • 06:04: handle that
  • 06:05: [Music]
  • 06:07: i one thing that we can do
  • 06:11: i see
  • 06:15: that well let's look at how the data
  • 06:17: frame looks
  • 06:18: in that middle area so
  • 06:22: of course the with the data frame
  • 06:23: display the middle gets cut out
  • 06:25: so we can use iloke to
  • 06:28: look at some range of rows so we know
  • 06:31: that was
  • 06:32: 63 rows so let's just look at row 60 to
  • 06:36: 70.
  • 06:38: um so now we can clearly see here
  • 06:41: the table ending and then we can see
  • 06:46: a number of um blank rows
  • 06:50: um and then we have the next table
  • 06:53: starting
  • 06:56: um so there's a number of ways
  • 06:59: we can deal with this to be able to say
  • 07:03: let's cut off everything after
  • 07:07: the table the first table
  • 07:10: so ultimately it's really easy to drop
  • 07:13: rows which have all missing values
  • 07:15: so i'm not too worried about these ones
  • 07:17: once i drop the missing values these
  • 07:19: will be gone
  • 07:20: um it's once we're getting to here
  • 07:27: and we want to try and pick something
  • 07:29: that's going to be consistent
  • 07:31: as we load in data for different
  • 07:33: companies or different time periods
  • 07:36: so you know this is the first kind of
  • 07:38: distinctive marker that we see
  • 07:41: for the second table but we can see this
  • 07:43: is very specific to the date
  • 07:45: so perhaps not a good choice to try and
  • 07:47: target
  • 07:49: now we do have this description here in
  • 07:51: the next row
  • 07:53: signifying that we're starting the next
  • 07:54: table
  • 07:56: so that seems like an appropriate thing
  • 07:58: to target
  • 07:59: um because we can assume that that is
  • 08:02: going to be the same
  • 08:03: for other time periods and other
  • 08:05: companies as well
  • 08:07: um so we can try to target
  • 08:10: that row
  • 08:13: so that row is going to be
  • 08:19: what is going to be where the
  • 08:24: description
  • 08:26: is equal to description
  • 08:30: right um so by that
  • 08:34: we're now able to target this second
  • 08:36: header row
  • 08:38: um and that way we know we've gotten
  • 08:40: into the next table
  • 08:43: but we don't want to cut off just that
  • 08:45: we want to start cutting off
  • 08:46: from the row before um
  • 08:51: so if we
  • 08:54: take the index from that we get
  • 08:57: row 67
  • 09:00: [Music]
  • 09:02: we pull the first value out of that we
  • 09:04: get actually the number 67
  • 09:07: and then if we subtract 1 now we get row
  • 09:10: 66
  • 09:12: which is the first row that we don't
  • 09:14: want to keep
  • 09:17: so there then we can
  • 09:19: [Music]
  • 09:21: say that that's our
  • 09:24: fiscal year row index
  • 09:28: because now we're talking about row 66
  • 09:30: here
  • 09:33: um and now we can
  • 09:35: [Music]
  • 09:37: do we can get all of the data frame up
  • 09:40: until
  • 09:41: that fiscal year row index
  • 09:44: so now we'll have those blanks on the
  • 09:46: end but we do not have that fistful
  • 09:48: year row and we do not have the second
  • 09:50: table
  • 09:51: at all
  • 09:55: so that's a good step we've removed that
  • 09:57: second table
  • 09:59: but now we also want to remove these
  • 10:01: blank rows thankfully pandas has
  • 10:04: a convenient method for this drop n a
  • 10:07: um but we want to make sure that we're
  • 10:09: still keeping
  • 10:10: these even if there's no floating rate
  • 10:12: it can still be a valid
  • 10:14: row so we only want to um drop
  • 10:18: when all are missing so we put how
  • 10:20: equals all
  • 10:23: and now we can see that those other rows
  • 10:26: get dropped as well and yet we are
  • 10:27: keeping the
  • 10:28: rows which just have some missing values
  • 10:33: so we can do df equals
  • 10:37: that in order to save it back into the
  • 10:39: result
  • 10:42: um and that way we've got
  • 10:46: now just the first table
  • 10:53: so
  • 10:56: just kind of looking through um one
  • 11:00: other issue that we see in here
  • 11:02: is we have these dashes um
  • 11:05: and pandas this nan represents a missing
  • 11:08: value
  • 11:08: a dash represents that's actually the
  • 11:11: string dash
  • 11:12: and so if we try and do any math with
  • 11:15: these columns which have the dash in
  • 11:17: them
  • 11:18: it is not going to work for example
  • 11:24: so this is the principal do usd column
  • 11:32: so if we try to sum that we're going to
  • 11:34: get an error
  • 11:36: and it's actually going to say you can't
  • 11:38: add a float in a string and that string
  • 11:40: that it's talking about
  • 11:41: is that dash which is in there
  • 11:46: so in order to fix that we can use
  • 11:48: replace
  • 11:49: on the data frame so
  • 11:52: with this same look at the data
  • 11:55: if we do a replace of dash
  • 11:59: with uh we do have to go import numpy as
  • 12:03: mp
  • 12:06: and have to uh
  • 12:09: run that
  • 12:13: um then
  • 12:17: uh the missing value that pandas uses is
  • 12:21: numpy dot nan
  • 12:24: so we're replacing the dash which we're
  • 12:26: seeing here in row
  • 12:27: 16 with a nand let's see how that looks
  • 12:31: now row 16 we have a man in each of
  • 12:34: those spots which previously had dashes
  • 12:38: so let's save that back into the data
  • 12:41: frame
  • 12:46: so df equals df.replace
  • 12:50: um
  • 12:53: let me just load it and then clean it
  • 12:56: again
  • 12:58: so now um that looks good we have the
  • 13:00: right number of rows
  • 13:02: and now let's try to do this sum again
  • 13:05: now the sum actually
  • 13:06: works because we replace the dash with
  • 13:08: the missing
  • 13:10: um so i think
  • 13:14: this represents a nice concrete step
  • 13:18: in the process um so let's go ahead and
  • 13:21: wrap that up
  • 13:22: into a function so
  • 13:26: um load details
  • 13:30: we can give that a file path
  • 13:34: um and it's going to
  • 13:37: use read excel and do all these other
  • 13:40: things
  • 13:40: which we just did
  • 13:43: and ultimately return the data frame
  • 13:46: instead of
  • 13:47: that specific file we're going to use
  • 13:49: the file path
  • 13:51: um so now we can do df equals load
  • 13:54: that details on that file path
  • 13:57: and we should have the same thing which
  • 13:59: we do
  • 14:02: great so now we have to start tackling
  • 14:06: some of those other cleanup pieces that
  • 14:08: i mentioned
  • 14:10: so
  • 14:15: let's next tackle um
  • 14:20: the um
  • 14:24: coupons so
  • 14:27: for the coupons we have
  • 14:30: a couple things to deal with let's look
  • 14:34: back at the
  • 14:35: sheet over here so we have
  • 14:39: the fact that we have this percent sign
  • 14:40: on everything and then we also have
  • 14:42: these
  • 14:43: ranges to deal with
  • 14:48: so there's a lot of different ways to
  • 14:50: tackle
  • 14:51: this problem um
  • 14:56: you know you could just uh
  • 14:59: use like string methods like we can
  • 15:03: um we have a string
  • 15:06: 4.0 percent we can do a strip
  • 15:10: and strip off the percent sign and then
  • 15:12: you're left without it
  • 15:14: and then you can take float of
  • 15:18: all that and we get back to the number
  • 15:21: and then divide by 100
  • 15:24: and we get that actual um percentage
  • 15:29: so we can use this kind of approach but
  • 15:30: this is not going to work
  • 15:32: for the range uh where we have the range
  • 15:36: like 4.5 percent to 5.5 percent
  • 15:41: um so we could use similar
  • 15:44: approaches along with split split
  • 15:48: will let us take this string and split
  • 15:51: it into a list
  • 15:53: on a delimitator so here we're putting
  • 15:55: the space dash space as a delimiter and
  • 15:58: now we have these two
  • 16:00: uh coupon rates which we can do that
  • 16:01: kind of approach with
  • 16:03: and then um take the midpoint
  • 16:06: of the range um
  • 16:11: so that works perfectly fine but then
  • 16:13: you have to
  • 16:14: come up with a way to detect which of
  • 16:17: these two styles of strings you got
  • 16:19: um you could do a split and then check
  • 16:22: the length
  • 16:23: uh because if you just split that you're
  • 16:26: gonna
  • 16:26: get a list of length one if you split
  • 16:28: that you have a list of length two so
  • 16:30: you could
  • 16:30: compare off that to make the distinction
  • 16:34: um so that's one possible approach
  • 16:37: um the approach that i'm going to look
  • 16:40: at
  • 16:40: in this video is actually using regular
  • 16:42: expressions
  • 16:44: so regular expressions are kind of a
  • 16:46: mini language within python
  • 16:48: and they also exist in many other
  • 16:51: programming languages
  • 16:52: and actually have nearly the same syntax
  • 16:54: across
  • 16:55: different programming languages and
  • 16:58: they're a way of
  • 17:00: parsing strings they sleela you match
  • 17:04: any kind of pattern and strings and
  • 17:06: extract any parts out of them
  • 17:08: that you want the downside being that
  • 17:11: they do tend to be a little bit
  • 17:12: complicated to write
  • 17:14: and read because they're their own
  • 17:17: little mini language um
  • 17:20: but they're extremely powerful so i do
  • 17:23: recommend
  • 17:23: um taking some time to try and learn
  • 17:25: them
  • 17:28: and i think i will be releasing at some
  • 17:30: point in the future a video
  • 17:32: dedicated to a regular expression so i'm
  • 17:34: not going to spend a whole lot of time
  • 17:35: on them here
  • 17:38: but we want to write a regular
  • 17:39: expression to be able to match
  • 17:41: this pattern of a percent
  • 17:44: and a dash another percent and then
  • 17:47: extract the two numbers
  • 17:48: out of that um
  • 17:52: so in order to write regular expressions
  • 17:54: i like to use
  • 17:55: a little tool to help me in that
  • 17:58: tool so reg x is is the abbreviation for
  • 18:02: regular expressions
  • 18:04: and regexpal is the tool
  • 18:07: that i like to use to help out
  • 18:10: so this tool is nice because you put
  • 18:14: your whatever string you're trying to
  • 18:17: match
  • 18:17: down here and then you start typing your
  • 18:20: regular expression
  • 18:21: up here and
  • 18:25: it will show what you're matching so
  • 18:27: again this is not
  • 18:28: a video to teach you regular expressions
  • 18:30: so the syntax is
  • 18:32: um you know not going to make any sense
  • 18:34: to you but
  • 18:35: you'll see kind of how this works um
  • 18:38: so backslash d will match numbers
  • 18:42: and so now you can see it's highlighting
  • 18:43: the numbers here
  • 18:45: and if i put a dot
  • 18:48: then you can see it's matching the dots
  • 18:50: as well number
  • 18:51: and then dot
  • 18:55: well actually uh dot means any single
  • 18:58: character
  • 18:59: if i escape it with a backslash then it
  • 19:01: means the literal dot
  • 19:03: um and when you highlight um
  • 19:06: over this and regexpal it also explains
  • 19:09: to you what you have here
  • 19:13: and so we then
  • 19:16: [Music]
  • 19:20: can make a character group out of this
  • 19:24: and then say that it repeats one or more
  • 19:26: times
  • 19:27: and now we're matching uh you know
  • 19:31: any number here regardless of how many
  • 19:33: decimal places are there on it
  • 19:36: and then we're going to have a percent
  • 19:38: sign and then space dash space
  • 19:42: and then uh you know basically the same
  • 19:44: thing
  • 19:45: over here yeah entire same thing over
  • 19:48: here
  • 19:50: um
  • 19:52: so what am i missing here
  • 19:59: there we go and then percent and now
  • 20:01: it's matching clearly
  • 20:03: so this is our regular expression
  • 20:08: and then one more thing that we want to
  • 20:10: do is put groups in here
  • 20:12: and groups are going to allow us to
  • 20:13: extract certain parts
  • 20:15: of that string so ultimately you want to
  • 20:17: extract
  • 20:18: this here and this here so let's put
  • 20:21: those into groups with parentheses
  • 20:24: um so that whole thing is grouped not
  • 20:27: the percent sign
  • 20:28: and over here uh same thing not the
  • 20:31: percent sign so now we have our regular
  • 20:35: expression and we can bring that
  • 20:36: back over into python
  • 20:41: so this would be
  • 20:44: the range pattern
  • 20:48: um and i'm going to import re for
  • 20:51: regular expressions
  • 20:53: and it's going to be re.compile is what
  • 20:55: we're going to do here
  • 20:57: again i'm really glossing over the
  • 20:58: regular expression stuff i plan to do a
  • 21:00: whole video on that later
  • 21:04: but you're just seeing quickly how it
  • 21:05: can be used here
  • 21:08: so then we have uh our string
  • 21:12: so let's just find that as a and then
  • 21:16: uh or let's say that's coupon string
  • 21:19: give it a better name
  • 21:21: um so then we can do
  • 21:24: uh range pattern dot match
  • 21:28: on the coupon string and
  • 21:32: when we get a match we're gonna see
  • 21:33: something like that
  • 21:35: um when we don't
  • 21:40: uh then you're going to get none
  • 21:44: as your result um
  • 21:48: so what you can do
  • 21:52: is then say that's the match
  • 21:55: um and then if there's a match um
  • 21:59: then we can extract the result from it
  • 22:02: so then
  • 22:03: um the bottom percent uh
  • 22:06: is going to be match.group uh one group
  • 22:09: zero is the entire thing group one is
  • 22:12: the first spot you put parentheses
  • 22:14: um and then the top percent is going to
  • 22:17: be match.group
  • 22:19: two um and let's just make sure that we
  • 22:23: have those
  • 22:24: uh correct yep we got those
  • 22:28: and we can then take the float
  • 22:32: of each yep so now we have the numbers
  • 22:38: and then the midpoint is going to be
  • 22:40: equal to
  • 22:42: the top percent plus the bottom uh
  • 22:45: percent divided by two
  • 22:49: so the midpoint we should get uh five
  • 22:51: for that which we do
  • 22:55: so now we can make this into a function
  • 22:59: um so we can call this standardized
  • 23:02: fixed
  • 23:03: coupons fix the coupon string as an
  • 23:06: argument
  • 23:10: okay
  • 23:14: and so we're going to return the
  • 23:15: midpoint in that case
  • 23:17: but what about in the case where we
  • 23:18: don't have a match so we basically have
  • 23:21: two alternatives here
  • 23:22: and the other alternative is this where
  • 23:25: um we already looked at
  • 23:30: um for that we just do this kind of
  • 23:32: approach so
  • 23:34: um if it's not a match then we're going
  • 23:37: to
  • 23:38: turn this on the coupon string
  • 23:44: let's just save the divided by 100 for
  • 23:46: later
  • 23:49: so now we have a function where we can
  • 23:53: give it say 2.5 it gets the 2.5 out of
  • 23:57: that or we can give it
  • 23:59: a range and it's going to get the
  • 24:01: midpoint
  • 24:02: of that range
  • 24:06: so now we can
  • 24:08: [Music]
  • 24:09: apply this to the correct column
  • 24:13: so that's the coupon base right column
  • 24:20: so we're going to apply standardizing
  • 24:23: the fixed coupons and we get an
  • 24:32: error um
  • 24:34: [Music]
  • 24:35: expected string or bytes like object
  • 24:39: um so
  • 24:40: [Music]
  • 24:43: i'm just going to quickly look at what
  • 24:45: that error was
  • 24:47: using the built-in debugger uh and we
  • 24:50: can see it's because we're getting a
  • 24:52: nand here there is a missing value so we
  • 24:55: do have to handle that
  • 24:56: in here as well if pd dot
  • 25:00: is null coupon string
  • 25:03: then we want to return the coupon string
  • 25:08: okay now this is working appropriately
  • 25:13: okay um so
  • 25:17: we can
  • 25:18: [Music]
  • 25:20: um then have this
  • 25:24: um as part of our
  • 25:28: cleanup the coupon base rate
  • 25:33: we're going to apply this
  • 25:38: okay so um
  • 25:47: now we want to handle next
  • 25:51: um let's look at the floating rate
  • 25:55: coupons next so for the floating rate
  • 25:59: we have a similar thing here
  • 26:05: where for the floating rate
  • 26:09: most the time here we just have this
  • 26:12: um n a and
  • 26:16: that has been converted to nam over here
  • 26:18: so
  • 26:20: it already has missing values
  • 26:22: appropriately
  • 26:26: but what we're going to need to handle
  • 26:27: is these libor plus
  • 26:31: so we can use a similar regular
  • 26:34: expression
  • 26:34: approach so i'll bring that over to
  • 26:38: regex pal
  • 26:40: um let's now put
  • 26:44: that as our test string and so what
  • 26:46: we're looking for
  • 26:47: is the string libor plus
  • 26:50: but plus is a special character in regex
  • 26:53: so you gotta put a backslash
  • 26:55: um and then we're going to have numbers
  • 26:59: as well as uh decimals and
  • 27:02: uh one or more of those and
  • 27:06: then a percent sign so that's our
  • 27:08: regular expression
  • 27:10: for that um
  • 27:14: so that's going to be the libor pattern
  • 27:23: okay and i'll move this up here as well
  • 27:26: since it's also done
  • 27:29: um and now we
  • 27:33: can use a similar approach so libor
  • 27:35: let's
  • 27:37: [Music]
  • 27:40: give that example coupon string
  • 27:43: live or plus
  • 27:47: and try a similar approach so
  • 27:50: we have the libor pattern that match
  • 27:55: the coupon string we can see that's a
  • 27:57: match
  • 27:58: um so that's the match and then if match
  • 28:03: um and we got to add our group in here
  • 28:06: so here the group is is going to be
  • 28:08: around the number again
  • 28:12: um so if match
  • 28:15: then the percentage
  • 28:19: is going to be the float of match.group
  • 28:22: one
  • 28:25: and then let's just say that libor
  • 28:30: was one percent
  • 28:33: um then we're going to
  • 28:38: return
  • 28:42: the library rate
  • 28:46: times 100 because this is in whole
  • 28:48: percentage points
  • 28:50: plus the percentage and
  • 28:53: not quite returning yet we're not
  • 28:55: writing our function
  • 28:56: um so this would be the
  • 29:00: you know the overall rate and then we
  • 29:03: can look at that rate
  • 29:07: libor just libor
  • 29:11: and we get 1.75 so that appears to be
  • 29:13: working correctly
  • 29:16: so let's uh wrap that into a function
  • 29:20: so standardize
  • 29:24: loading coupons
  • 29:27: takes the coupon spring as well
  • 29:30: as libor um
  • 29:34: and it does this and
  • 29:38: you know just in case there are just
  • 29:41: normal rates in there
  • 29:42: we can throw this in there
  • 29:45: and we'll also throw our catch for
  • 29:48: missing values
  • 29:49: as well
  • 29:55: okay so now
  • 29:58: we should be able to take our floating
  • 30:00: rate column
  • 30:05: and apply standardized floating coupons
  • 30:14: and pass the libor rate
  • 30:19: and it seems it did not work
  • 30:22: appropriately um for some reason
  • 30:27: our string was getting through without a
  • 30:28: match oh because we never returned
  • 30:31: okay so then we want to return the rate
  • 30:33: and
  • 30:34: now it seems to be working uh let's just
  • 30:37: drop an a in there so we can see the two
  • 30:40: yeah they're both live or plus
  • 30:42: uh 0.75 and so that's correct if we put
  • 30:46: a
  • 30:46: two percent rate now we're getting 2.75
  • 30:49: percent so
  • 30:50: everything looks correct
  • 30:54: um so we can now add this into our
  • 30:57: cleaning approach
  • 30:58: so yeah floating rate
  • 31:02: he pulls that
  • 31:06: um so now this is starting to get
  • 31:10: fairly cleaned up we have the coupon and
  • 31:12: the floating rates
  • 31:13: both handled well
  • 31:18: now we want to um
  • 31:21: combine the two um
  • 31:24: so we can create a coupon column
  • 31:29: so that's going to be equal to
  • 31:33: the coupon base rate column
  • 31:36: and then we can say well wherever that's
  • 31:38: missing we can fill it in
  • 31:40: with the floating rate column
  • 31:47: so now we have the coupon right here um
  • 31:51: and it should be taking the values um
  • 32:04: yeah so i got the rose or floating rates
  • 32:06: not missing and we can see
  • 32:07: it is indeed getting filled in for the
  • 32:09: coupon um
  • 32:11: so everything is working appropriately
  • 32:15: um and then in order to do calculations
  • 32:18: with it we want to get it to decimal
  • 32:20: so df coupon equals df coupon
  • 32:25: divided by 100.
  • 32:30: and now that looks good
  • 32:34: so now the coupons are cleaned up and we
  • 32:37: just have the
  • 32:38: maturities to deal with now
  • 32:41: um let me just get all this group
  • 32:44: together
  • 32:47: and uh bring this up with my other
  • 32:50: functions that are now
  • 32:51: working well
  • 32:56: okay um
  • 32:57: [Music]
  • 33:05: so now we've got to deal with the
  • 33:09: maturities
  • 33:11: um and so with the maturities uh we
  • 33:14: talked about
  • 33:15: how um well
  • 33:19: one we got to get it to date time
  • 33:22: instead of just a string is that how it
  • 33:25: is now let me look at the data types
  • 33:28: maturities object so yes it's coming as
  • 33:30: a string
  • 33:32: um and not a date time
  • 33:35: um and then we've got to deal with this
  • 33:38: year
  • 33:38: issue as well um
  • 33:42: so let's see
  • 33:47: um what is the column name
  • 33:52: so maturity
  • 33:58: um
  • 34:04: so
  • 34:07: trying to look for
  • 34:11: where one of these years
  • 34:14: so let's grab 20 that's where we have a
  • 34:18: year
  • 34:18: so what is the type of that
  • 34:22: that's actually coming as an integer
  • 34:26: and then one that's a date
  • 34:29: is actually that is actually coming as a
  • 34:32: date time not as a string
  • 34:36: so
  • 34:40: we want to let these basically
  • 34:44: pass through as they are they're already
  • 34:47: in date formats that's what we want
  • 34:50: um but if we're getting an integer that
  • 34:53: means we have just the year
  • 34:54: and then we've got to convert it um
  • 34:58: so we can use pandas's two date time to
  • 35:01: do that conversion
  • 35:02: for us um so if we just say you know
  • 35:05: december 31st
  • 35:07: of may 20 it will be able to take that
  • 35:10: and turn it into
  • 35:12: a timestamp which will work for the
  • 35:14: daytime purposes
  • 35:17: and so if we you know just have our year
  • 35:23: then we can just format the string to
  • 35:25: drop in the year
  • 35:27: and it will do that conversion for us
  • 35:32: so we can do that along with some type
  • 35:35: checking
  • 35:36: in order to standardize the maturities
  • 35:40: so um you know if
  • 35:44: is instance that's a way we can check
  • 35:46: the type
  • 35:47: of um and here let's make this the more
  • 35:51: general maturity because we're not going
  • 35:53: to
  • 35:54: um know coming in whether it's
  • 35:58: a year or a date time so if it's an
  • 36:01: integer
  • 36:03: then um that's going to be our result
  • 36:08: um and
  • 36:11: otherwise basically it's passing through
  • 36:13: let's just go ahead and make the
  • 36:14: function for that
  • 36:16: standardized maturity dates
  • 36:22: um so if it's an editor then return the
  • 36:26: date time of the end of the year
  • 36:28: there needs to be maturity there
  • 36:31: um if it's not
  • 36:38: uh
  • 36:40: you know i guess it could conceivably
  • 36:42: conceivably get loaded
  • 36:43: as a string so if it's not a string or
  • 36:45: an integer then we're just going to
  • 36:46: return the maturity so if it's a date
  • 36:48: time or if it's missing then just return
  • 36:50: it
  • 36:51: um and then basically we get here if
  • 36:55: it's just a regular string
  • 36:56: and then we can just return it
  • 37:00: um so now
  • 37:06: we give it a year that will give us the
  • 37:08: end of that year
  • 37:09: um if we give it a date time
  • 37:13: then well let's just throw it at the
  • 37:15: data frame and see
  • 37:17: um so df maturity
  • 37:22: dot apply standardized maturity dates
  • 37:26: and we can see this indeed has worked
  • 37:28: for us
  • 37:29: so we didn't get any errors uh we
  • 37:32: remember row 20
  • 37:33: had that year we can see now we're
  • 37:35: getting the end of the year for that
  • 37:38: um and the missing values are passing
  • 37:40: through just fine
  • 37:43: so that becomes the next step here in
  • 37:45: our cleanup process
  • 37:48: we're going to overwrite the maturity
  • 37:49: column with that
  • 37:55: [Music]
  • 38:00: and that has allowed it to become a date
  • 38:01: time type as well now that everything is
  • 38:03: date
  • 38:06: um so then we just have to
  • 38:11: um convert that i'll just you know put
  • 38:14: this over here because we're done with
  • 38:16: it
  • 38:18: then we have to convert the
  • 38:22: um year
  • 38:25: or the the maturity date into a number
  • 38:29: of years from today
  • 38:31: so i'm going to import date time and i
  • 38:33: already covered this
  • 38:34: um in a prior video
  • 38:38: um on on handling the market value of
  • 38:41: debt
  • 38:43: in python so i'm just going to copy the
  • 38:46: function that we had
  • 38:47: from that this uh year until from date
  • 38:51: so it gets today's date and then looks
  • 38:53: the difference between
  • 38:55: the date we're passing to it and today
  • 38:57: and ultimately gives you a number of
  • 38:58: years
  • 39:01: so then we can just apply that
  • 39:05: [Music]
  • 39:07: so we take maturity now that it's
  • 39:10: been all cleaned up and we apply the
  • 39:14: year until from date um
  • 39:18: then we get the number of years so we
  • 39:20: can add that in here
  • 39:22: as df maturity
  • 39:25: years equals that
  • 39:31: and now we should have that coming into
  • 39:33: the data frame as well
  • 39:36: so one last step is to determine
  • 39:41: whether we have enough uh information to
  • 39:44: be able to
  • 39:45: value the bond um
  • 39:49: so there's a few things that will make
  • 39:51: it invalid
  • 39:53: um so
  • 39:57: um if
  • 40:01: um
  • 40:05: uh if the coupon
  • 40:10: is missing um which
  • 40:15: here we can see the rows that have a
  • 40:16: missing coupon then we're not able to do
  • 40:18: anything
  • 40:19: um if the
  • 40:22: maturity years is missing
  • 40:28: that also is invalid we have no way to
  • 40:31: know
  • 40:32: the value if we don't know how long it's
  • 40:33: outstanding
  • 40:35: um and then uh
  • 40:38: if the maturity years is less than
  • 40:41: zero that means the bond is already um
  • 40:44: expired and so those are going to be
  • 40:48: invalid as well so we can
  • 40:52: um use loc in order to
  • 40:54: [Music]
  • 40:55: catch all these three things at once um
  • 41:00: so if we just
  • 41:05: wrap this in parentheses and then put
  • 41:07: this pipe
  • 41:09: operator for or and then we can put
  • 41:11: these other conditions in there as well
  • 41:19: wrap it in parentheses pipe
  • 41:24: and then if the coupon is null
  • 41:31: so now that should catch all three of
  • 41:34: those cases
  • 41:37: um so then we can
  • 41:41: use that to create a column
  • 41:44: which represents whether the row is
  • 41:47: valid
  • 41:48: um so yeah we can see we've got the
  • 41:50: negative maturities we can see we've got
  • 41:52: the missing coupons
  • 41:54: and we can see we've got the missing
  • 41:55: maturities all caught here in one filter
  • 42:01: so what we can do is we can say um
  • 42:05: create a validity column and just set it
  • 42:07: to true
  • 42:08: but wherever any of those things uh
  • 42:11: conditions are meant for it to be
  • 42:12: invalid then we set it to false
  • 42:18: so then we have this validity column
  • 42:22: and you can
  • 42:25: [Music]
  • 42:28: oh i accidentally had it set
  • 42:31: the entire row of false we want to then
  • 42:33: put a comma
  • 42:35: um and then i've just said only the
  • 42:38: valid column
  • 42:38: let me get a new data frame and then do
  • 42:40: this again and
  • 42:42: yes now it was false the rest of the row
  • 42:45: still stays there so we can do other
  • 42:46: things with it like just take it out as
  • 42:48: principle instead of doing an evaluation
  • 42:52: but now we can differentiate between the
  • 42:54: two so let's just create a function out
  • 42:56: of this
  • 42:58: so we can call this a sine validity
  • 43:01: of bonds
  • 43:04: and this is going to just work in place
  • 43:07: so we don't even have to return anything
  • 43:10: um i'm going to put this up there
  • 43:16: um and now we can just call this on the
  • 43:19: data frame add in
  • 43:22: um and now we have that in there
  • 43:25: as well so this is all working
  • 43:28: everything is all cleaned up now it's
  • 43:30: ready for you to go do evaluation from
  • 43:32: here
  • 43:34: but let's just wrap all these steps into
  • 43:37: one further function
  • 43:39: so we can call this load and
  • 43:43: clean uh statement or
  • 43:46: that details and that's going to take
  • 43:49: the file path it's also going to take
  • 43:51: the libor rate
  • 43:53: and it's going to do all these things
  • 43:58: and this is going to be the file path
  • 44:02: and it's already using a library in
  • 44:03: there and then we're going to return the
  • 44:06: data frame
  • 44:07: at the end so now we can just do load
  • 44:10: and clean.details
  • 44:13: on the file path and give it a library
  • 44:15: rate
  • 44:18: and we'll have the data frame all
  • 44:20: cleaned up and ready to go
  • 44:24: so now we have our code working
  • 44:28: to be able to do all this cleanup
  • 44:32: but it's kind of you know a decent
  • 44:35: number of lines of code here
  • 44:37: it's definitely not ideal to to copy
  • 44:39: paste this into every project and if you
  • 44:41: did
  • 44:43: and you needed to make some updates to
  • 44:44: it
  • 44:46: that could cause some issues because
  • 44:49: then you've got to go
  • 44:50: back and update it and every place that
  • 44:52: you've copy pasted it so really you just
  • 44:53: want to have it once
  • 44:54: and be able to use it everywhere so
  • 44:57: that's the perfect application
  • 44:59: of a python module um so
  • 45:04: we can create a module
  • 45:08: um in order to
  • 45:11: use this very easily in any of our
  • 45:14: models
  • 45:15: but real quick before we do that i do
  • 45:18: want to try
  • 45:18: just loading in different dead details
  • 45:21: to make sure it works with pfizer's
  • 45:23: details as well to see if anything is
  • 45:26: different there
  • 45:28: so let's give that a try
  • 45:32: um and there did not work so what is
  • 45:35: going on here
  • 45:36: you can see you got a key error for the
  • 45:38: floating rate um
  • 45:40: so that suggests maybe that the um
  • 45:43: floating rate column is not there
  • 45:49: but let's see let me open these up
  • 45:55: so we can take a look
  • 45:59: and we can see in the pfizer data that
  • 46:01: the floating rate column is there so why
  • 46:03: are we getting an
  • 46:04: error so let's go back over to python
  • 46:08: and we can use the built-in debugger
  • 46:12: in order to check this out so let's get
  • 46:14: back up to our code
  • 46:16: uh so let's look at the columns of the
  • 46:19: data frame
  • 46:20: loading rate is right there isn't that
  • 46:23: weird
  • 46:24: um so this is definitely
  • 46:27: a gotcha that you may face and loading
  • 46:29: in
  • 46:29: [Music]
  • 46:30: um data from uh
  • 46:34: different sources now let me um
  • 46:37: look at that directly oh it looks a
  • 46:39: little bit different now that i pull it
  • 46:41: out
  • 46:42: um it's floating backslash xaoxa zero
  • 46:46: rate what does that mean so uh
  • 46:50: it's uh you know a little outside of the
  • 46:53: score outside of the scope of the course
  • 46:55: to talk in detail about what's going on
  • 46:57: here but basically
  • 46:58: this is another way that white space can
  • 47:01: be represented
  • 47:02: in a file and so
  • 47:05: what looks to us like a space is being
  • 47:07: represented by this backslash
  • 47:09: xa0 so what we can do here while we're
  • 47:13: loading in the columns
  • 47:14: is just replace any of these xa zeros
  • 47:18: with a space and then we shouldn't have
  • 47:21: to worry about it
  • 47:23: so i'm just going to try adding that in
  • 47:24: here uh df.columns
  • 47:28: equals um column.replace
  • 47:32: uh this oh uh backslash xa0
  • 47:36: with a space or call in the update
  • 47:40: columns
  • 47:42: and hopefully that will allow it to work
  • 47:45: awesome
  • 47:45: this time it loaded it and now
  • 47:48: everything seems to be working properly
  • 47:51: here as well
  • 47:54: and then let's just check we can go back
  • 47:55: to walmart and everything is still
  • 47:58: working there as well
  • 47:59: awesome so this is ready to go um
  • 48:03: so now let's let's take this over and
  • 48:05: put it into a python module
  • 48:07: so that we can reuse this in multiple
  • 48:10: different
  • 48:10: models so
  • 48:14: we can go and create a text file
  • 48:17: and then i'm going to rename this file
  • 48:20: uh you want to pick a name without
  • 48:22: spaces
  • 48:23: so i'm going to call this cap iq debt
  • 48:25: and you want the file extension to be
  • 48:27: py that's what makes it a python module
  • 48:32: and then we can just uh take
  • 48:36: all of this that's why it was good to
  • 48:39: gather it all in one cell
  • 48:42: and put it over here um
  • 48:45: and now save this and so now this is
  • 48:48: just a text file
  • 48:50: or the dot py extension which is in the
  • 48:52: same folder
  • 48:56: and now let me totally
  • 49:00: let me just totally start a new jupiter
  • 49:02: notebook
  • 49:03: um so over here we've never
  • 49:07: defined anything um
  • 49:10: and so we don't have any of those that
  • 49:12: loading functions ready
  • 49:14: um but let's try import app iq dat
  • 49:20: so we do that and we can see it was
  • 49:22: successful
  • 49:23: and now when we do cap iq dot we can see
  • 49:27: that we have
  • 49:28: all these functions that we defined
  • 49:31: in what was previously the jupiter cell
  • 49:33: and we brought over to the python module
  • 49:36: so now we can do load and clean debt
  • 49:37: details um
  • 49:39: so df equals and we're going to do that
  • 49:43: on the pfizer
  • 49:45: um that details
  • 49:50: and say our libor rate let's just say
  • 49:52: it's two percent now
  • 49:54: um that's the live one
  • 49:58: and there we go now we have our data
  • 50:00: frame cleaned up ready to go
  • 50:02: and we don't have all that code in this
  • 50:05: jupyter notebook
  • 50:06: so now um we're able to take all this
  • 50:09: detail
  • 50:10: of all this loading cleaning process for
  • 50:13: the data pipeline
  • 50:15: and just offload that so that the reader
  • 50:17: of our model does not have to care
  • 50:19: about how all this works it happens in
  • 50:22: the background
  • 50:24: through this cap iq depth module and now
  • 50:27: this
  • 50:27: module can be reused in as many uh
  • 50:30: different um notebooks as you want
  • 50:34: and if you go and make some changes like
  • 50:37: we had to do
  • 50:38: to be able to handle the father details
  • 50:41: then that's going to go and apply for
  • 50:43: all your different models
  • 50:46: so maybe you decide okay well i'm going
  • 50:48: to add some functionality which prints
  • 50:50: out some summary statistics and
  • 50:53: totals up the principle to make sure
  • 50:55: that you have the right amount of
  • 50:56: principle
  • 50:57: well you just go and change it in this
  • 50:59: module and then
  • 51:00: all your different models which use this
  • 51:02: module are going to get that
  • 51:04: functionality
  • 51:05: so there's a lot of power in
  • 51:08: restructuring
  • 51:09: your code to use modules
  • 51:13: so that's an overview of how we can
  • 51:16: clean up
  • 51:16: capital iq debt details
  • 51:20: 100 percent automated in python so now
  • 51:23: it's ready to use to value
  • 51:25: all the bonds and calculate the total
  • 51:27: market value of debt
  • 51:29: um and we also covered how to
  • 51:32: take that code and offload it into a
  • 51:35: module such that it can be easily reused
  • 51:37: in multiple different projects so
  • 51:41: thanks for listening and see you next
  • 51:48: time