1 00:00:00,850 --> 00:00:09,140 [Music] 2 00:00:11,030 --> 00:00:15,030 This presentation is delivered by the Stanford Center for Professional Development. 3 00:00:23,220 --> 00:00:28,420 Okay. Good morning. Welcome to CS229, the machine learning class. 4 00:00:29,070 --> 00:00:33,800 So what I wanna do today is just spend a little time going over the logistics of the class, and then 5 00:00:34,530 --> 00:00:36,550 we'll start to talk a bit about machine learning. 6 00:00:38,310 --> 00:00:42,770 By way of introduction, my name's Andrew Ng and I'll be instructor for this class. And 7 00:00:43,120 --> 00:00:46,860 so I personally work in machine learning, and I've worked on it for about 15 years now, 8 00:00:47,300 --> 00:00:47,530 and 9 00:00:48,010 --> 00:00:55,500 I actually think that machine learning is the most exciting field of all the computer sciences. So I'm actually always excited about teaching this class. 10 00:00:56,890 --> 00:00:58,440 Sometimes I actually think that machine learning 11 00:00:59,230 --> 00:01:03,740 is not only the most exciting thing in computer science, but the most exciting thing in all of human endeavor, so 12 00:01:04,630 --> 00:01:05,520 maybe a little bias there. 13 00:01:06,330 --> 00:01:08,460 I also want to introduce the TAs, 14 00:01:09,010 --> 00:01:13,990 who are all graduate students doing research in or related to the machine learning and all aspects of machine learning. 15 00:01:14,710 --> 00:01:16,820 Paul Baumstarck 16 00:01:17,570 --> 00:01:19,130 works in machine learning and computer vision. 17 00:01:19,890 --> 00:01:26,300 Catie Chang is actually a neuroscientist who applies machine learning algorithms to try to understand the human brain. 18 00:01:26,960 --> 00:01:30,820 Tom Do is another PhD student, works in computational biology 19 00:01:31,620 --> 00:01:37,710 and in sort of the basic fundamentals of machine learning. Zico Kolter is the head TA, 20 00:01:38,280 --> 00:01:39,420 he's head TA two years in a row now 21 00:01:40,070 --> 00:01:45,880 he works also in machine learning and applies them to a bunch of robots. And Daniel Ramage is 22 00:01:46,500 --> 00:01:50,730 I guess he's not here, Daniel applies learning algorithms to problems in natural language processing. 23 00:01:52,050 --> 00:01:52,270 So 24 00:01:53,120 --> 00:01:57,250 you'll get to know the TAs and me much better throughout this quarter, but just from 25 00:01:57,900 --> 00:02:05,020 the sorts of things the TA's do, I hope you can already tell that machine learning is a highly interdisciplinary 26 00:02:06,050 --> 00:02:09,210 topic in which just the TAs find learning algorithms 27 00:02:09,460 --> 00:02:13,220 to problems in computer vision and biology and robots and language. 28 00:02:13,750 --> 00:02:18,140 And machine learning is one of those things that has and is having 29 00:02:18,610 --> 00:02:20,590 a large impact on many applications. 30 00:02:21,390 --> 00:02:28,180 So just in my own daily work, I actually frequently end up talking to people like helicopter pilots to biologists 31 00:02:28,710 --> 00:02:30,680 to people in computer systems or databases 32 00:02:31,110 --> 00:02:33,220 to economists and sort of 33 00:02:33,720 --> 00:02:40,000 also an unending stream of people from industry coming to Stanford 34 00:02:40,830 --> 00:02:43,910 interested in applying machine learning methods to their own problems. 35 00:02:44,860 --> 00:02:45,980 So 36 00:02:47,380 --> 00:02:53,150 yeah, this is fun. A couple of weeks ago, a student actually forwarded to me an article in "Computer World" about 37 00:02:53,660 --> 00:02:55,310 the 12 IT skills 38 00:02:55,990 --> 00:03:00,900 that employers can't say no to. So it's about sort of the 12 most 39 00:03:01,380 --> 00:03:05,230 desirable skills in all of IT and all of information technology, 40 00:03:05,730 --> 00:03:07,690 and topping the list was actually machine learning. 41 00:03:08,210 --> 00:03:09,070 So I think this is a 42 00:03:09,640 --> 00:03:16,040 good time to be learning this stuff and learning algorithms and having a large impact on many segments of science and industry. 43 00:03:17,210 --> 00:03:19,990 I'm actually curious about something. 44 00:03:21,370 --> 00:03:23,920 Learning algorithms is one of the things that touches many areas 45 00:03:24,880 --> 00:03:29,820 of science and industries, and I'm just kind of curious. How many people here are computer science majors, are in the computer science department? 46 00:03:31,320 --> 00:03:33,050 Okay. About half of you. How many people are from EE? 47 00:03:35,770 --> 00:03:39,290 Oh, okay, maybe about a fifth. How many biologers are there here? 48 00:03:41,380 --> 00:03:44,640 Wow, just a few, not many. I'm surprised. Anyone from statistics? 49 00:03:45,980 --> 00:03:48,630 Okay, a few. So where are the rest of you from? 50 00:03:51,530 --> 00:03:53,250 Say again? iCME. Cool. 51 00:03:53,640 --> 00:03:56,440 Civi and what else? 52 00:03:58,770 --> 00:03:59,370 Synthesis, [inaudible] systems. Yeah, cool. 53 00:04:01,230 --> 00:04:07,410 Chemi. Cool. Aero/astro. Yes, right. Yeah, okay, cool. Anyone else? 54 00:04:09,580 --> 00:04:15,310 Pardon? MSNE. All right. Cool. Yeah. Pardon? 55 00:04:18,440 --> 00:04:20,750 Oh, I see, industry. Okay. Cool. Great, great. 56 00:04:21,700 --> 00:04:25,270 So as you can tell from a cross-section of this class, I think we're a very 57 00:04:25,860 --> 00:04:31,790 diverse audience in this room, and that's one of the things that makes this class fun to teach and fun to be in, I think. 58 00:04:33,400 --> 00:04:40,450 So in this class, we've tried to convey to you a broad set of principles and tools that will be useful for doing many, many things. 59 00:04:41,120 --> 00:04:41,970 And 60 00:04:42,870 --> 00:04:49,190 every time I teach this class, I can actually very confidently say that after December, no matter what you're going to do after 61 00:04:49,660 --> 00:04:50,300 this December 62 00:04:51,400 --> 00:04:55,750 when you've sort of completed this class, you'll find the things you learn in this class very useful, 63 00:04:57,330 --> 00:05:00,240 and these things will be useful pretty much no matter what you end up doing later in your life. 64 00:05:02,520 --> 00:05:02,840 65 00:05:03,840 --> 00:05:10,470 So I have more logistics to go over later, but let's say a few more words about machine learning. 66 00:05:11,530 --> 00:05:16,790 I feel that machine learning grew out of early work in AI, early work in artificial intelligence. 67 00:05:17,510 --> 00:05:24,060 And over the last -- I wanna say last 15 or last 20 years or so, it's been viewed as a sort of 68 00:05:24,760 --> 00:05:28,320 growing new capability for computers. 69 00:05:28,970 --> 00:05:33,620 And in particular, it turns out that there are many programs or there are many applications that you can't program by hand. 70 00:05:34,330 --> 00:05:40,820 For example, if you want to get a computer to read handwritten characters, to read sort of handwritten digits, 71 00:05:42,210 --> 00:05:46,320 that actually turns out to be amazingly difficult to write a piece of software 72 00:05:47,190 --> 00:05:50,870 to take this input, an image of something that I wrote and to figure out 73 00:05:51,540 --> 00:05:55,920 just what it is, to translate my cursive handwriting into -- 74 00:05:57,090 --> 00:06:00,810 to extract the characters I wrote out in longhand. 75 00:06:02,720 --> 00:06:03,670 And other things: 76 00:06:04,310 --> 00:06:08,310 One thing that my students and I do is autonomous flight. It turns out to be extremely difficult 77 00:06:08,730 --> 00:06:11,810 to sit down and write a program to fly a helicopter. 78 00:06:12,650 --> 00:06:16,430 But in contrast, if you want to do things like 79 00:06:17,310 --> 00:06:21,790 to get software to fly a helicopter or have software recognize handwritten digits, 80 00:06:22,820 --> 00:06:29,130 one very successful approach is to use a learning algorithm and have a computer learn by itself how to, say, recognize your handwriting. 81 00:06:30,210 --> 00:06:32,060 And in fact, handwritten digit recognition, 82 00:06:32,580 --> 00:06:37,730 this is pretty much the only approach that works well. It uses applications that are hard to program by hand. 83 00:06:38,770 --> 00:06:41,540 Learning algorithms has also made 84 00:06:42,630 --> 00:06:46,540 I guess significant inroads in what's sometimes called database mining. 85 00:06:47,410 --> 00:06:54,440 So, for example, with the growth of IT and computers, increasingly many hospitals are keeping around medical records 86 00:06:55,440 --> 00:06:59,790 of what sort of patients, what problems they had, what their prognoses was, what the outcome was. 87 00:07:00,500 --> 00:07:06,570 And taking all of these medical records, which started to be digitized only about maybe 15 years ago, 88 00:07:07,130 --> 00:07:11,450 applying learning algorithms to them can turn raw medical records into 89 00:07:12,050 --> 00:07:18,080 what I might loosely call medical knowledge in which we start to detect trends in medical practice and even start to alter medical practice 90 00:07:18,870 --> 00:07:23,510 as a result of medical knowledge that's derived by applying learning algorithms 91 00:07:24,300 --> 00:07:30,200 by applying learning algorithms to the sorts of medical records that hospitals have just been building over the last 92 00:07:31,090 --> 00:07:32,750 15, 20 years in an electronic format. 93 00:07:34,050 --> 00:07:34,960 94 00:07:35,770 --> 00:07:42,650 Turns out that most of you probably use learning algorithms -- I don't know -- I think half a dozen times a day or maybe a dozen times a day or more, 95 00:07:43,340 --> 00:07:45,250 and often without knowing it. 96 00:07:46,380 --> 00:07:49,890 So, for example, every time you send mail via the US Postal System, 97 00:07:51,640 --> 00:07:58,210 turns out there's an algorithm that tries to automatically read the zip code you wrote on your envelope, 98 00:07:59,050 --> 00:08:00,270 and that's done by a learning algorithm. 99 00:08:01,380 --> 00:08:06,780 So every time you send US mail, you are using a learning algorithm, perhaps without even being aware of it. 100 00:08:07,990 --> 00:08:14,010 Similarly, every time you write a check, I actually don't know the number for this, but a significant fraction of checks that you write 101 00:08:14,830 --> 00:08:21,150 are processed by a learning algorithm that's learned to read the digits, so the dollar amount that you wrote down on your check. 102 00:08:21,870 --> 00:08:26,390 So every time you write a check, there's another learning algorithm that you're probably using without even being aware of it. 103 00:08:27,400 --> 00:08:32,490 If you use a credit card, or I know at least one phone company was doing this, 104 00:08:32,810 --> 00:08:36,150 and lots of companies like eBay as well that do electronic transactions, 105 00:08:37,310 --> 00:08:39,600 there's a good chance that there's a learning algorithm in the background 106 00:08:40,080 --> 00:08:45,020 trying to figure out if, say, your credit card's been stolen or if someone's engaging in a fraudulent transaction. 107 00:08:46,190 --> 00:08:51,780 If you use a website like Amazon or Netflix that will often recommend 108 00:08:52,320 --> 00:08:54,840 books for you to buy or movies for you to rent or whatever, 109 00:08:55,410 --> 00:09:01,610 these are other examples of learning algorithms that have learned what sorts of things you like to buy or what sorts of movies you like to watch 110 00:09:02,610 --> 00:09:04,500 and can therefore give customized recommendations to you. 111 00:09:06,870 --> 00:09:12,550 Just about a week ago, I had my car serviced, and even there, my car mechanic was trying to explain to me 112 00:09:13,050 --> 00:09:18,100 some learning algorithm in the innards of my car that's sort of doing its best to optimize my driving performance for fuel efficiency or something. 113 00:09:20,450 --> 00:09:27,780 So, see, most of us use learning algorithms half a dozen, a dozen, maybe dozens of times without even knowing it. 114 00:09:28,420 --> 00:09:33,220 And of course, learning algorithms are also doing things like giving us a growing understanding of the human genome. 115 00:09:34,090 --> 00:09:38,880 So if someday we ever find a cure for cancer, I bet learning algorithms will have had a large role in that. 116 00:09:39,880 --> 00:09:41,150 That's sort of the thing that Tom works on, yes? 117 00:09:43,130 --> 00:09:43,880 118 00:09:44,870 --> 00:09:48,870 So in teaching this class, I sort of have three goals. 119 00:09:49,530 --> 00:09:54,330 One of them is just to I hope convey some of my own excitement about machine learning to you. 120 00:09:55,570 --> 00:09:57,300 The second goal is 121 00:09:58,040 --> 00:10:04,060 by the end of this class, I hope all of you will be able to apply state-of-the-art machine learning algorithms 122 00:10:05,930 --> 00:10:07,100 to whatever problems you're interested in. 123 00:10:08,370 --> 00:10:13,310 And if you ever need to build a system for reading zip codes, you'll know how to do that by the end of this class. 124 00:10:14,960 --> 00:10:16,370 And lastly, 125 00:10:17,400 --> 00:10:23,380 by the end of this class, I realize that only a subset of you are interested in doing research in machine learning, 126 00:10:24,040 --> 00:10:30,040 but by the conclusion of this class, I hope that all of you will actually be well qualified to start doing research in machine learning, okay? 127 00:10:32,820 --> 00:10:33,040 128 00:10:34,470 --> 00:10:40,350 So let's say a few words about logistics. The prerequisites of this class are written on 129 00:10:40,900 --> 00:10:41,560 one of the handouts, 130 00:10:42,330 --> 00:10:43,580 are as follows: 131 00:10:44,930 --> 00:10:47,050 In this class, I'm going to assume that all of you have 132 00:10:47,630 --> 00:10:48,230 sort of basic 133 00:10:48,760 --> 00:10:54,690 knowledge of computer science and knowledge of the basic computer skills and principles. 134 00:10:55,750 --> 00:11:01,850 So I assume all of you know what big-O notation, that all of you know about sort of data structures like queues, stacks, binary trees, 135 00:11:02,360 --> 00:11:04,810 and that all of you know enough programming skills to, like, 136 00:11:05,560 --> 00:11:06,570 write a simple computer program. 137 00:11:07,500 --> 00:11:11,610 And it turns out that most of this class will not be very programming intensive, 138 00:11:12,320 --> 00:11:17,150 although we will do some programming, mostly in either MATLAB or Octave. I'll say a bit more about that later. 139 00:11:18,280 --> 00:11:22,710 I also assume familiarity with basic probability and statistics. 140 00:11:23,610 --> 00:11:28,340 So most undergraduate statistics class, like Stat 116 taught here at Stanford, 141 00:11:28,970 --> 00:11:29,510 will be more than enough. 142 00:11:30,050 --> 00:11:36,370 I'm gonna assume all of you know what random variables are, that all of you know what expectation is, what a variance or a random variable is. 143 00:11:37,890 --> 00:11:43,860 And in case of some of you, it's been a while since you've seen some of this material. At some of the discussion sections, 144 00:11:44,300 --> 00:11:47,070 we'll actually go over some of the prerequisites, 145 00:11:47,580 --> 00:11:50,290 sort of as a refresher course under prerequisite class. 146 00:11:51,020 --> 00:11:53,850 I'll say a bit more about that later as well. 147 00:11:54,890 --> 00:11:59,580 Lastly, I also assume familiarity with basic linear algebra. And again, 148 00:12:00,130 --> 00:12:02,960 most undergraduate linear algebra courses are more than enough. 149 00:12:03,490 --> 00:12:06,640 So if you've taken courses like Math 51, 150 00:12:07,110 --> 00:12:11,710 103, Math 113 or CS205 at Stanford, that would be more than enough. 151 00:12:12,490 --> 00:12:17,750 Basically, I'm gonna assume that all of you know what matrices and vectors are, that you know how to multiply matrices 152 00:12:18,220 --> 00:12:19,910 and vectors and multiply matrix and matrices, 153 00:12:20,420 --> 00:12:21,980 that you know what a matrix inverse is. 154 00:12:22,580 --> 00:12:26,310 If you know what an eigenvector of a matrix is, that'd be even better. But if you don't 155 00:12:27,240 --> 00:12:29,940 quite know or if you're not quite sure, that's fine, too. We'll go over it in the review sections. 156 00:12:32,960 --> 00:12:35,130 So 157 00:12:40,500 --> 00:12:40,720 158 00:12:41,270 --> 00:12:46,100 there are a couple more logistical things I should deal with in this class. 159 00:12:48,160 --> 00:12:48,750 One is that, 160 00:12:49,150 --> 00:12:55,060 as most of you know, CS229 is a televised class. And in fact, I guess many of you are probably 161 00:12:55,470 --> 00:12:59,200 watching this at home on TV, so I'm gonna say hi to our home viewers. 162 00:13:00,360 --> 00:13:00,570 163 00:13:01,680 --> 00:13:04,310 So earlier this year, I approached SCPD, 164 00:13:04,920 --> 00:13:10,350 which televises these classes, about trying to make a small number of Stanford classes publicly available 165 00:13:10,930 --> 00:13:13,630 or posting the videos on the web. And 166 00:13:14,340 --> 00:13:17,610 so this year, Stanford is actually starting a small pilot program 167 00:13:18,100 --> 00:13:22,050 in which we'll post videos of a small number of classes online, so on the Internet 168 00:13:22,670 --> 00:13:26,780 in a way that makes it publicly accessible to everyone. I'm very excited about that 169 00:13:27,990 --> 00:13:28,320 because machine learning in school, 170 00:13:29,250 --> 00:13:29,900 let's get the word out there. 171 00:13:30,730 --> 00:13:34,060 One of the consequences of this is that -let's see- 172 00:13:34,510 --> 00:13:39,020 so videos or pictures of the students in this classroom will not be posted online, 173 00:13:39,570 --> 00:13:41,260 so your images - so don't worry about 174 00:13:42,460 --> 00:13:45,510 being by seeing your own face appear on YouTube one day. 175 00:13:46,310 --> 00:13:50,430 But the microphones may pick up your voices, 176 00:13:51,120 --> 00:13:54,080 so I guess the consequence of that is that 177 00:13:54,630 --> 00:14:00,650 because microphones may pick up your voices, no matter how irritated you are at me, don't yell out swear words in the middle of class, 178 00:14:01,220 --> 00:14:01,440 179 00:14:02,040 --> 00:14:06,920 but because there won't be video you can safely sit there and make faces at me, and that won't show, okay? 180 00:14:08,230 --> 00:14:08,930 181 00:14:09,900 --> 00:14:14,880 Let's see. I also handed out this - there were two handouts I hope most of you have, 182 00:14:15,280 --> 00:14:19,930 course information handout. So let me just say a few words about parts of these. 183 00:14:21,290 --> 00:14:27,550 On the third page, there's a section that says Online Resources. 184 00:14:33,650 --> 00:14:35,190 Oh, okay. Louder? 185 00:14:36,180 --> 00:14:42,670 Actually, could you turn up the volume? Testing. Is this better? Testing, testing. Okay, cool. 186 00:14:43,480 --> 00:14:46,370 Thanks. 187 00:14:47,470 --> 00:14:52,990 So all right, online resources. The class has a home page, so it's in on the handouts. I won't write on the chalkboard - 188 00:14:53,480 --> 00:14:54,640 http://cs229.stanford.edu. 189 00:14:54,860 --> 00:14:56,450 http://cs229.stanford.edu. 190 00:14:56,960 --> 00:14:57,930 And so 191 00:14:58,440 --> 00:15:03,480 when there are homework assignments or things like that, we usually won't sort of 192 00:15:04,030 --> 00:15:06,500 - in the mission of saving trees, we will 193 00:15:07,090 --> 00:15:12,480 usually not give out many handouts in class. So homework assignments, homework solutions 194 00:15:12,880 --> 00:15:16,280 will be posted online at the course home page. 195 00:15:16,890 --> 00:15:20,940 As far as this class, I've also written, and I guess I've also 196 00:15:21,340 --> 00:15:22,300 revised every year 197 00:15:22,620 --> 00:15:26,750 a set of fairly detailed lecture notes that cover the technical content of this class. 198 00:15:27,330 --> 00:15:31,620 And so if you visit the course homepage, you'll also find the detailed lecture notes 199 00:15:32,270 --> 00:15:32,940 that cover ... 200 00:15:33,390 --> 00:15:37,690 that go over in detail all the math and equations and so on that I'll be doing in class. 201 00:15:38,490 --> 00:15:38,760 202 00:15:39,070 --> 00:15:42,660 There's also a newsgroup, su.class.cs229, also written on the handout. 203 00:15:43,180 --> 00:15:43,660 204 00:15:44,130 --> 00:15:48,130 This is a newsgroup that's sort of a forum for people in the class to 205 00:15:48,420 --> 00:15:52,530 get to know each other and have whatever discussions you want to have amongst yourselves. 206 00:15:53,030 --> 00:15:53,530 So 207 00:15:54,060 --> 00:15:56,940 the class newsgroup will not be monitored by the TAs and me. 208 00:15:57,340 --> 00:15:58,680 But this is a place for you 209 00:15:59,050 --> 00:16:03,140 to form study groups or find project partners or discuss homework problems and so on, 210 00:16:03,600 --> 00:16:05,240 and it's not monitored by the TAs and me. 211 00:16:05,620 --> 00:16:08,890 So feel free to talk trash about this class there. 212 00:16:10,030 --> 00:16:12,350 If you want to contact the teaching staff, 213 00:16:12,640 --> 00:16:14,120 please use the email address 214 00:16:14,440 --> 00:16:19,050 written down here, cs229-qa@cs.stanford.edu. 215 00:16:20,290 --> 00:16:27,180 This goes to an account that's read by all the TAs and me. So rather than sending us email individually, if you send email to 216 00:16:27,650 --> 00:16:30,410 this account, it will actually let us get back to you 217 00:16:30,930 --> 00:16:35,470 maximally quickly with answers to your questions. 218 00:16:36,250 --> 00:16:38,110 If you're asking questions about homework problems, 219 00:16:38,540 --> 00:16:44,210 please say in the subject line which assignment and which question the email refers to, since that will also 220 00:16:44,650 --> 00:16:48,010 help us to route your question to the appropriate TA or to me appropriately 221 00:16:48,500 --> 00:16:51,610 and get the response back to you quickly. 222 00:16:52,100 --> 00:16:57,490 Let's see. Skipping ahead - let's see - for homework, one midterm, one open and term project. 223 00:16:58,140 --> 00:16:59,650 Notice on the honor code. 224 00:17:01,930 --> 00:17:07,380 So one thing that I think will help you to succeed and do well in this class and even help you to enjoy this class more 225 00:17:07,770 --> 00:17:09,510 is if you form a study group. 226 00:17:10,330 --> 00:17:13,160 So start looking around where you're sitting now or 227 00:17:13,560 --> 00:17:15,890 at the end of class today, 228 00:17:16,280 --> 00:17:20,840 mingle a little bit and get to know your classmates. I strongly encourage you to form study groups 229 00:17:21,290 --> 00:17:22,380 and sort of have a group of people 230 00:17:22,940 --> 00:17:27,670 to study with and have a group of your fellow students to talk over these concepts with. 231 00:17:28,260 --> 00:17:30,150 You can also post on the class 232 00:17:30,690 --> 00:17:33,660 newsgroup if you want to use that to try to form a study group. 233 00:17:34,170 --> 00:17:39,520 But some of the problems sets in this class are reasonably difficult. 234 00:17:40,000 --> 00:17:42,280 People that have taken the class before may tell you they were very difficult. 235 00:17:42,770 --> 00:17:44,790 And just I think, 236 00:17:45,250 --> 00:17:47,200 I bet it would be more fun for you, and you'd probably 237 00:17:47,600 --> 00:17:53,170 have a better learning experience if you form a study group of people to work with. So I definitely encourage you to do that. 238 00:17:54,420 --> 00:17:57,980 And just to say a word on the honor code, which is 239 00:17:58,500 --> 00:18:02,850 I definitely encourage you to form a study group and work together, discuss homework problems together. 240 00:18:03,380 --> 00:18:06,370 But if you discuss homework problems with 241 00:18:06,820 --> 00:18:10,330 other students, then I'll ask you to 242 00:18:10,890 --> 00:18:11,660 sort of go home 243 00:18:12,100 --> 00:18:18,430 and write down your own solutions independently without referring to notes that were taken in any of your joint study sessions. 244 00:18:19,210 --> 00:18:20,520 So in other words, 245 00:18:21,140 --> 00:18:24,820 when you turn in a homework problem, what you turn in should be something that 246 00:18:25,230 --> 00:18:29,840 was reconstructed independently by yourself and without referring to notes that you took 247 00:18:30,330 --> 00:18:33,070 during your study sessions with other people, okay? And obviously, 248 00:18:33,490 --> 00:18:37,340 showing your solutions to others or copying other solutions directly is right out. 249 00:18:38,180 --> 00:18:38,700 250 00:18:39,160 --> 00:18:46,340 We occasionally also reuse problem set questions from previous years so that the problems are a bit more debugged and work more smoothly. 251 00:18:46,860 --> 00:18:50,270 And as a result of that, I also ask you not to look at 252 00:18:50,580 --> 00:18:53,470 solutions from previous years, and this includes both sort of 253 00:18:54,060 --> 00:18:57,820 official solutions that we've given out to previous generations of this class 254 00:18:58,300 --> 00:19:00,300 and previous solutions that 255 00:19:00,600 --> 00:19:05,360 people that have taken this class in previous years may have written out by themselves, okay? 256 00:19:06,160 --> 00:19:06,830 Sadly, 257 00:19:07,400 --> 00:19:09,260 in this class, there are usually 258 00:19:09,680 --> 00:19:10,760 - sadly, in previous years, 259 00:19:11,160 --> 00:19:12,960 there have often been a few honor code violations 260 00:19:13,330 --> 00:19:17,340 in this class. And last year, I think I prosecuted five honor code violations, 261 00:19:17,570 --> 00:19:19,430 which I think is a ridiculously large number. 262 00:19:19,990 --> 00:19:22,880 And so just don't work without solutions, 263 00:19:23,430 --> 00:19:26,680 and hopefully there'll be zero honor code violations this year. I'd love for that to happen. 264 00:19:28,430 --> 00:19:29,140 265 00:19:29,940 --> 00:19:34,640 The section here on the late homework policy if you ever want to hand in a homework late, I'll leave you to read that yourself. 266 00:19:35,400 --> 00:19:35,710 267 00:19:35,980 --> 00:19:43,130 We also have a midterm, which is scheduled for Thursday, 8th of November at 6:00 p.m., so please keep that ... 268 00:19:43,770 --> 00:19:45,890 so please keep that evening free. 269 00:19:46,860 --> 00:19:48,200 And 270 00:19:49,540 --> 00:19:52,800 let's see. 271 00:19:58,550 --> 00:20:05,300 And one more administrative thing I wanted to say is about the class project. So 272 00:20:06,240 --> 00:20:09,610 part of the goal of this class is to leave you well equipped to apply 273 00:20:09,920 --> 00:20:13,990 machine learning algorithms to a problem or to do research in machine learning. 274 00:20:14,520 --> 00:20:17,140 And so as part of this class, I'll ask you 275 00:20:17,540 --> 00:20:21,800 to execute a small research project sort of as a small term project. 276 00:20:22,590 --> 00:20:27,910 And what most students do for this is either apply machine learning to a problem that 277 00:20:28,490 --> 00:20:29,470 you find interesting 278 00:20:29,860 --> 00:20:32,540 or investigate some aspect of machine learning. 279 00:20:33,240 --> 00:20:38,320 So to those of you that are either already doing research or to those of you who are in industry, 280 00:20:38,800 --> 00:20:40,690 you're taking this from a company, 281 00:20:41,220 --> 00:20:42,280 one fantastic sort of 282 00:20:43,450 --> 00:20:51,000 way to do a class project would be if you apply machine learning algorithms to a problem that you're interested in, to a problem that you're already working on, 283 00:20:51,400 --> 00:20:55,860 whether it be a science research problem or sort of a problem in industry 284 00:20:56,330 --> 00:20:58,190 where you're trying to get a system to work using a learning algorithm. 285 00:20:58,750 --> 00:20:59,360 286 00:21:00,010 --> 00:21:05,230 To those of you that are not currently doing research, 287 00:21:06,240 --> 00:21:08,920 one great way to do a project would be if you 288 00:21:09,290 --> 00:21:14,130 apply learning algorithms to just pick a problem that you care about. Pick a problem that you find interesting, 289 00:21:14,620 --> 00:21:17,830 and apply learning algorithms to that and play with the ideas and see what happens. 290 00:21:18,850 --> 00:21:19,120 291 00:21:20,200 --> 00:21:20,500 And 292 00:21:22,210 --> 00:21:22,420 let's see. 293 00:21:24,500 --> 00:21:25,240 Oh, 294 00:21:26,230 --> 00:21:32,690 and the goal of the project should really be for you to do a publishable piece of research in machine learning, okay? 295 00:21:33,450 --> 00:21:33,720 296 00:21:34,340 --> 00:21:38,860 And if you go to the course website, you'll actually find a list of the projects 297 00:21:39,130 --> 00:21:44,890 that students had done last year. And so I'm holding the list in my hand. You can go home later and take a look at it online. 298 00:21:45,400 --> 00:21:48,740 But reading down this list, I see that last year, there were students that 299 00:21:49,250 --> 00:21:53,160 applied learning algorithms to control a snake robot. There was a few 300 00:21:53,780 --> 00:22:00,330 projects on improving learning algorithms. There's a project on flying autonomous aircraft. 301 00:22:01,000 --> 00:22:04,380 There was a project actually done by our TA Paul on 302 00:22:04,930 --> 00:22:09,950 improving computer vision algorithms using machine learning. There are a couple of projects on Netflix 303 00:22:10,210 --> 00:22:12,160 rankings using learning algorithms; 304 00:22:12,750 --> 00:22:13,100 a few 305 00:22:13,400 --> 00:22:19,190 medical robots; ones on segmenting [inaudible] to segmenting pieces of the body using learning algorithms; 306 00:22:19,860 --> 00:22:25,160 one on musical instrument detection; another on [irony|RNA] sequence alignment; and 307 00:22:26,030 --> 00:22:32,000 a few algorithms on understanding the brain neuroscience, actually quite a few projects on neuroscience; 308 00:22:32,730 --> 00:22:37,340 a couple of projects on [undescending|understanding] fMRI data on brain scans, 309 00:22:38,000 --> 00:22:43,690 and so on; another project on market makings, the financial trading. There was an interesting project on 310 00:22:44,250 --> 00:22:50,060 trying to use learning algorithms to decide what is it that makes a person's face physically attractive. 311 00:22:50,680 --> 00:22:55,590 There's a learning algorithm on optical illusions, and so on. And it goes on, so lots of fun projects. And take a look, 312 00:22:56,200 --> 00:22:56,410 313 00:22:56,970 --> 00:23:02,640 then come up with your own ideas. But whatever you find cool and interesting, I hope you'll be able to make 314 00:23:03,220 --> 00:23:05,290 machine learning a project out of it. Yeah, question? 315 00:23:07,670 --> 00:23:09,780 *Student : *Are these group projects? *Instructor (Andrew Ng): *Oh, yes, thank you. *Student : *So how many people can be in a group? 316 00:23:10,250 --> 00:23:16,520 *Instructor (Andrew Ng): *Right. So projects can be done in groups of up to three people. So as part of forming study groups, 317 00:23:17,060 --> 00:23:19,940 later today as you get to know your classmates, 318 00:23:20,170 --> 00:23:23,350 I definitely also encourage you to grab two other people 319 00:23:23,790 --> 00:23:25,630 and form a group of up to three people for your project, okay? 320 00:23:27,090 --> 00:23:30,830 And just start brainstorming ideas for now amongst yourselves. 321 00:23:31,210 --> 00:23:34,770 You can also come and talk to me or the TAs if you want to brainstorm ideas with us. 322 00:23:36,490 --> 00:23:38,690 Okay. 323 00:23:40,310 --> 00:23:40,890 So 324 00:23:42,270 --> 00:23:47,960 one more organizational question. I'm curious, how many of you know MATLAB? 325 00:23:49,740 --> 00:23:51,640 Wow, cool, quite a lot. 326 00:23:52,500 --> 00:23:55,220 Okay. So as part of the 327 00:23:56,900 --> 00:24:01,760 - actually how many of you know Octave or have used Octave? Oh, okay, much smaller number. 328 00:24:02,640 --> 00:24:02,850 So 329 00:24:03,630 --> 00:24:09,260 as part of this class, especially in the homeworks, we'll ask you to implement a few programs, 330 00:24:09,680 --> 00:24:12,350 a few machine learning algorithms as part of the homeworks. 331 00:24:12,860 --> 00:24:13,070 332 00:24:13,410 --> 00:24:19,130 And most of those homeworks will be done in either MATLAB or in Octave, which is sort of - I know 333 00:24:19,660 --> 00:24:23,840 some people call it a free version of MATLAB, which it sort of is, sort of isn't. 334 00:24:24,700 --> 00:24:31,100 So I guess for those of you that haven't seen MATLAB before, and I know most of you have, MATLAB is I guess part of 335 00:24:31,670 --> 00:24:37,750 the programming language that makes it very easy to write codes using matrices, 336 00:24:38,410 --> 00:24:42,950 to write code for numerical routines, to move data around, to plot data. 337 00:24:43,610 --> 00:24:49,410 And it's sort of an extremely easy to learn tool to use for implementing a lot of learning algorithms. 338 00:24:50,340 --> 00:24:51,670 339 00:24:52,230 --> 00:24:57,860 And in case some of you want to work on your own home computer or something if you don't have a MATLAB license, 340 00:24:58,220 --> 00:25:00,110 for the purposes of this class, 341 00:25:00,550 --> 00:25:06,310 there's also - [inaudible] write that down [inaudible] MATLAB - there's also a software package 342 00:25:06,750 --> 00:25:09,940 called Octave that you can download for free off the Internet. 343 00:25:10,420 --> 00:25:17,180 And it has somewhat fewer features than MATLAB, but it's free, and for the purposes of this class, it will work for just about everything. 344 00:25:18,280 --> 00:25:18,590 345 00:25:19,510 --> 00:25:20,780 So actually I, well, 346 00:25:21,680 --> 00:25:22,650 so yeah, 347 00:25:23,150 --> 00:25:27,390 just a side comment for those of you that haven't seen MATLAB before I guess, 348 00:25:27,830 --> 00:25:35,040 once a colleague of mine at a different university, not at Stanford, actually teaches another machine learning course. 349 00:25:35,590 --> 00:25:36,430 He's taught it for many years. 350 00:25:37,690 --> 00:25:38,870 So one day, 351 00:25:39,770 --> 00:25:43,450 he was in his office, and an old student of his from, like, ten years ago 352 00:25:44,190 --> 00:25:46,750 came into his office and he said, "Oh, professor, professor, 353 00:25:47,140 --> 00:25:50,540 thank you so much for your machine learning class. I learned so much from it. 354 00:25:50,990 --> 00:25:51,220 355 00:25:51,550 --> 00:25:57,370 There's this stuff that I learned in your class, and I now use every day. And it's helped me make lots of money, and here's a picture of 356 00:25:57,630 --> 00:25:59,270 my big house." 357 00:25:59,860 --> 00:26:04,660 So my friend was very excited. He said, "Wow. That's great. I'm glad to hear this machine learning stuff was actually useful. 358 00:26:05,110 --> 00:26:09,910 So what was it that you learned? Was it logistic regression? 359 00:26:10,380 --> 00:26:15,940 Was it the PCA? Was it the [data|Bayesian] networks? What was it that you learned that was so helpful?" And the student said, "Oh, it was the MATLAB." 360 00:26:16,560 --> 00:26:18,480 361 00:26:19,270 --> 00:26:23,790 So for those of you that don't know MATLAB yet, I hope 362 00:26:24,320 --> 00:26:27,280 you do learn it. It's not hard, and we'll actually 363 00:26:27,680 --> 00:26:32,140 have a short MATLAB tutorial in one of the discussion sections 364 00:26:32,850 --> 00:26:34,230 for those of you that don't know it. 365 00:26:35,250 --> 00:26:35,960 366 00:26:36,660 --> 00:26:36,970 Okay. 367 00:26:37,550 --> 00:26:42,950 The very last piece of logistical thing is the discussion sections. 368 00:26:43,530 --> 00:26:48,990 So discussion sections will be taught by the TAs, and attendance at discussion sections 369 00:26:49,410 --> 00:26:50,170 is optional, 370 00:26:50,530 --> 00:26:53,430 although they'll also be recorded and televised. 371 00:26:54,290 --> 00:26:59,630 And we'll use the discussion sections mainly for two things. For the next two or three weeks, 372 00:27:00,100 --> 00:27:03,550 we'll use the discussion sections to go over the prerequisites 373 00:27:04,040 --> 00:27:09,140 to this class or if some of you haven't seen probability or statistics for a while or [maybe|linear] algebra, 374 00:27:10,450 --> 00:27:14,970 we'll go over those in the discussion sections as a refresher for those of you that want one. 375 00:27:15,720 --> 00:27:15,940 376 00:27:16,190 --> 00:27:20,910 Later in this quarter, we'll also use the discussion sections to go over extensions 377 00:27:21,490 --> 00:27:22,440 for the material 378 00:27:22,710 --> 00:27:26,130 that I'm teaching in the main lectures. So machine learning is a huge field, and 379 00:27:26,520 --> 00:27:30,410 there are a few extensions that we really want to teach but didn't have time in the main lectures for. 380 00:27:30,950 --> 00:27:36,280 So later this quarter, we'll use the discussion sections to talk about things like convex optimization, 381 00:27:36,780 --> 00:27:40,690 to talk a little bit about hidden Markov models, which is a type of machine learning algorithm 382 00:27:41,220 --> 00:27:43,280 for modeling time series and a few other things, 383 00:27:43,580 --> 00:27:46,900 so extensions to the materials that I'll be covering in the main lectures. 384 00:27:47,520 --> 00:27:50,720 And attendance at the discussion sections is optional, 385 00:27:51,710 --> 00:27:56,660 okay? So that was all I had 386 00:27:57,540 --> 00:28:04,180 from logistics. Before we move on to start talking a bit about machine learning, let me check what questions you have. 387 00:28:16,350 --> 00:28:21,610 *Student : *[Inaudible] R or something like that? 388 00:28:22,180 --> 00:28:25,800 *Instructor (Andrew Ng) : *Oh, yeah, let's see, right. 389 00:28:26,400 --> 00:28:29,670 So our policy has been that you're welcome to use R, 390 00:28:30,630 --> 00:28:37,420 but I would strongly advise against it, mainly because in the last problem set, we actually supply some code that will run in Octave 391 00:28:37,960 --> 00:28:42,700 but that would be somewhat painful for you to translate into R yourself. 392 00:28:43,590 --> 00:28:46,300 So for your other assignments, if you wanna submit a solution in R, 393 00:28:46,780 --> 00:28:47,850 that's fine. 394 00:28:48,150 --> 00:28:52,040 But I think MATLAB is actually totally worth learning. I know R and MATLAB, 395 00:28:52,770 --> 00:28:54,340 and I personally end up using MATLAB 396 00:28:55,210 --> 00:28:56,820 quite a bit more often for various reasons. 397 00:29:04,020 --> 00:29:05,160 *Student : *For the [inaudible] project [inaudible]? 398 00:29:07,110 --> 00:29:11,790 *Instructor (Andrew Ng) : *So for the term project, you're welcome to do it in smaller groups of three, or you're welcome to do it by yourself 399 00:29:12,030 --> 00:29:13,410 or in groups of two. 400 00:29:14,010 --> 00:29:20,900 Grading is the same regardless of the group size, so with a larger group, you probably - I recommend 401 00:29:21,360 --> 00:29:24,710 trying to form a team, but it's actually totally fine to do it in a smaller group if you want. 402 00:29:30,070 --> 00:29:30,950 403 00:29:31,990 --> 00:29:32,780 *Student : *[Inaudible] what language [inaudible]? 404 00:29:33,780 --> 00:29:35,180 Instructor (Andrew Ng): *So let's see. 405 00:29:35,600 --> 00:29:42,110 There is no C programming in this class other than any that you may choose to do yourself in your project. 406 00:29:42,970 --> 00:29:43,920 So 407 00:29:44,530 --> 00:29:47,820 all the homeworks can be done in MATLAB or Octave, 408 00:29:48,400 --> 00:29:49,160 and 409 00:29:49,640 --> 00:29:53,570 let's see. And I guess the program prerequisites is more the 410 00:29:54,010 --> 00:29:57,620 ability to understand big-O notation and 411 00:29:57,910 --> 00:30:02,340 knowledge of what a data structure, like a linked list or a queue or binary [treatments|trees], 412 00:30:02,940 --> 00:30:05,930 more so than your knowledge of C or Java specifically. 413 00:30:10,180 --> 00:30:15,890 *Student : *Looking at the end semester project, I mean, what exactly will you be testing over there? [Inaudible]? 414 00:30:16,650 --> 00:30:20,400 *Instructor (Andrew Ng) : *Of the project? *Student : *Yeah. *Instructor (Andrew Ng) : Yeah, let me answer that later. 415 00:30:20,850 --> 00:30:24,980 In a couple of weeks, I shall give out a handout with guidelines for the project. 416 00:30:25,380 --> 00:30:27,350 But for now, we should think of the goal as 417 00:30:27,790 --> 00:30:32,650 being to do a cool piece of machine learning work that will let you experience 418 00:30:33,000 --> 00:30:35,260 the joys of machine learning firsthand 419 00:30:35,790 --> 00:30:39,970 and really try to think about doing a publishable piece of work. 420 00:30:40,460 --> 00:30:45,040 So many students will try to build a cool machine learning application. That's probably the most common project. 421 00:30:45,650 --> 00:30:47,020 Some students will try to 422 00:30:47,440 --> 00:30:52,890 improve state-of-the-art machine learning. Some of those projects are also very successful. It's a little 423 00:30:53,250 --> 00:30:54,090 bit harder to do. 424 00:30:54,580 --> 00:30:58,810 And there's also a smaller minority of students that will sometimes try to prove 425 00:30:59,690 --> 00:31:03,110 - develop the theory of machine learning further or try to prove theorems about machine learning. 426 00:31:05,210 --> 00:31:07,450 So they're usually great projects of all of those 427 00:31:07,670 --> 00:31:10,400 types with applications and machine learning being the most common. Anything else? 428 00:31:17,400 --> 00:31:19,900 Okay, cool. So 429 00:31:21,390 --> 00:31:27,600 that was it for logistics. Let's talk about learning algorithms. 430 00:31:29,380 --> 00:31:31,240 So can I have the laptop display, please, 431 00:31:33,220 --> 00:31:33,540 or the projector? 432 00:31:41,810 --> 00:31:46,530 Actually, could you lower the big screen? Cool. 433 00:31:56,240 --> 00:31:59,420 This is amazing customer service. Thank you. 434 00:32:01,200 --> 00:32:04,820 I see. Okay, cool. Okay. No, that's fine. I see. 435 00:32:07,410 --> 00:32:08,550 Okay. That's cool. Thanks. Okay. 436 00:32:08,930 --> 00:32:14,740 Big screen isn't working today, but I hope you can read things on the smaller screens out there. Actually, [inaudible] I think 437 00:32:15,200 --> 00:32:17,380 this room just got a new projector that 438 00:32:17,880 --> 00:32:18,160 439 00:32:18,890 --> 00:32:25,380 - someone sent you an excited email - was it just on Friday? - saying we just got a new projector and they said 440 00:32:25,910 --> 00:32:29,070 4,000-to-1 something or other brightness ratio. I don't know. 441 00:32:29,550 --> 00:32:34,690 Someone was very excited about the new projector in this room, but I guess we'll see that in operation on Wednesday. 442 00:32:35,450 --> 00:32:36,600 So 443 00:32:37,600 --> 00:32:42,380 start by talking about what machine learning is. What is machine learning? 444 00:32:43,510 --> 00:32:45,600 Actually, can you read ... 445 00:32:45,880 --> 00:32:49,930 can you read the text out there? Raise your hand if the text on the small screens is legible. 446 00:32:52,060 --> 00:32:55,710 Oh, okay, cool, mostly legible. Okay. So I'll just read it out. 447 00:32:56,410 --> 00:32:58,190 So what is machine learning? 448 00:32:59,040 --> 00:33:02,720 Way back in about 1959, Arthur Samuel 449 00:33:03,220 --> 00:33:09,700 defined machine learning informally as the [inaudible|field study] that gives computers to learn 450 00:33:11,120 --> 00:33:15,340 - [inaudible] that gives computers the ability to learn without being explicitly programmed. 451 00:33:16,220 --> 00:33:22,330 So Arthur Samuel, so way back in the history of machine learning, actually did something 452 00:33:23,750 --> 00:33:27,550 very cool, which was he wrote a checkers program, 453 00:33:28,350 --> 00:33:30,860 which would play games of checkers against itself. 454 00:33:31,800 --> 00:33:33,860 And so because 455 00:33:34,590 --> 00:33:38,700 a computer can play thousands of games against itself relatively quickly, 456 00:33:39,270 --> 00:33:43,160 Arthur Samuel had his program play thousands of games against itself, 457 00:33:43,770 --> 00:33:50,790 and over time it would start to learn to recognize patterns which led to wins and patterns which led to losses. 458 00:33:51,630 --> 00:33:53,130 So over time it learned things like that, 459 00:33:53,600 --> 00:33:57,470 "Gee, if I get a lot of pieces taken by the opponent, then 460 00:33:57,920 --> 00:34:01,680 I'm more likely to lose than win," or, "Gee, if I get my pieces into a certain position, 461 00:34:02,190 --> 00:34:03,630 then I'm especially likely 462 00:34:04,270 --> 00:34:04,920 to win rather than lose." 463 00:34:05,990 --> 00:34:08,170 And so over time, Arthur Samuel 464 00:34:08,820 --> 00:34:12,850 had a checkers program that would actually learn to play checkers by learning what are the 465 00:34:13,280 --> 00:34:19,350 sort of board positions that tend to be associated with wins and what are the board positions that tend to be associated with losses. 466 00:34:20,060 --> 00:34:20,390 And 467 00:34:21,280 --> 00:34:24,110 way back around 1959, 468 00:34:24,670 --> 00:34:26,850 the amazing thing about this was that 469 00:34:27,200 --> 00:34:31,760 his program actually learned to play checkers much better than Arthur Samuel himself could. 470 00:34:33,570 --> 00:34:33,780 So 471 00:34:34,360 --> 00:34:40,320 even today, there are some people that say, well, computers can't do anything that they're not explicitly programmed to. 472 00:34:40,770 --> 00:34:42,830 And Arthur Samuel's checkers program was 473 00:34:43,320 --> 00:34:44,430 maybe the first 474 00:34:45,420 --> 00:34:50,140 I think really convincing refutation of this claim. Namely, 475 00:34:50,950 --> 00:34:55,750 Arthur Samuel managed to write a checkers program that could play checkers much better than he personally could, 476 00:34:56,530 --> 00:35:01,340 and this is an instance of maybe computers learning to do things that they were not programmed explicitly to do. 477 00:35:03,430 --> 00:35:03,670 478 00:35:04,500 --> 00:35:11,210 Here's a more recent, a more modern, more formal definition of machine learning due to Tom Mitchell, 479 00:35:11,920 --> 00:35:17,070 who says that a well-posed learning problem is defined as follows: 480 00:35:17,810 --> 00:35:18,830 He says that 481 00:35:20,100 --> 00:35:25,490 a computer program is set to learn from an experience E with respect to some task T 482 00:35:26,130 --> 00:35:27,730 and some performance measure P 483 00:35:27,970 --> 00:35:31,760 if its performance on T as measured by P improves with experience E. 484 00:35:32,590 --> 00:35:35,190 So not only is it a definition, it even rhymes. 485 00:35:36,100 --> 00:35:36,740 486 00:35:38,180 --> 00:35:41,330 So, for example, in the case of checkers, 487 00:35:41,940 --> 00:35:45,670 the experience E that a program has would be 488 00:35:46,130 --> 00:35:49,990 the experience of playing lots of games of checkers against itself, say. 489 00:35:50,710 --> 00:35:53,510 The task T is the task of playing checkers, 490 00:35:54,080 --> 00:35:59,360 and the performance measure P will be something like the fraction of games it wins against a certain set of human opponents. 491 00:36:00,240 --> 00:36:06,150 And by this definition, we'll say that Arthur Samuel's checkers program has learned to play checkers, 492 00:36:07,230 --> 00:36:12,680 okay? So as an overview of what we're going to do in this class, 493 00:36:13,130 --> 00:36:16,910 this class is sort of organized into four major sections. 494 00:36:17,700 --> 00:36:21,070 We're gonna talk about four major topics in this class, the first of which is 495 00:36:21,670 --> 00:36:23,440 supervised learning. 496 00:36:25,020 --> 00:36:26,240 So let me give you an example of that. 497 00:36:28,940 --> 00:36:32,430 498 00:36:33,360 --> 00:36:33,790 So 499 00:36:34,660 --> 00:36:39,160 suppose you collect a data set of housing prices. 500 00:36:39,550 --> 00:36:44,580 And one of the TAs, Dan Ramage, actually collected a data set for me last week to use in the example later. 501 00:36:45,230 --> 00:36:50,700 But suppose that you go to collect 502 00:36:51,890 --> 00:36:53,850 statistics about how much houses 503 00:36:54,430 --> 00:36:57,390 cost in a certain geographic area. And Dan, the TA, 504 00:36:57,760 --> 00:37:00,400 collected data from housing prices in Portland, Oregon. 505 00:37:01,080 --> 00:37:02,080 506 00:37:02,970 --> 00:37:10,160 So what you can do is let's say plot the square footage of the house 507 00:37:10,920 --> 00:37:14,070 against the list price of the house, right, 508 00:37:14,450 --> 00:37:19,210 so you collect data on a bunch of houses. 509 00:37:19,740 --> 00:37:23,860 And let's say you get a data set like this with houses of different sizes that 510 00:37:24,310 --> 00:37:25,800 are listed for different amounts of money. 511 00:37:26,930 --> 00:37:27,610 512 00:37:29,190 --> 00:37:31,640 Now, let's say that I'm trying to sell a house in 513 00:37:32,400 --> 00:37:33,520 the same area as 514 00:37:33,880 --> 00:37:40,110 Portland, Oregon as where the data comes from. Let's say I have a house that's this size in square footage, 515 00:37:40,970 --> 00:37:46,110 and I want an algorithm to tell me about how much should I expect my house to sell for. 516 00:37:47,210 --> 00:37:53,950 So there are lots of ways to do this, and some of you may have seen elements of what I'm about to say before. 517 00:37:55,050 --> 00:37:59,370 So one thing you could do is look at this data and maybe put a straight line to it. 518 00:38:00,220 --> 00:38:02,860 And then if this is my house, 519 00:38:03,370 --> 00:38:09,790 you may then look at the straight line and predict that my house is gonna go for about that much money, right? 520 00:38:11,480 --> 00:38:11,770 521 00:38:12,750 --> 00:38:17,210 There are other decisions that we can make, which we'll talk about later, which is, well, what if 522 00:38:17,750 --> 00:38:23,140 I don't wanna put a straight line? Maybe I should put a quadratic function to it. Maybe that fits the data a little bit better. 523 00:38:23,660 --> 00:38:27,170 You notice if you do that, the price of my house goes up a bit, so that'd be nice. 524 00:38:28,180 --> 00:38:29,830 525 00:38:30,880 --> 00:38:37,170 And this sort of learning problem of learning to predict housing prices is an example of what's called a supervised learning problem. 526 00:38:38,080 --> 00:38:38,320 527 00:38:38,790 --> 00:38:44,380 And the reason that it's called supervised learning is because we're providing the algorithm a data set of 528 00:38:45,490 --> 00:38:50,730 a bunch of square footages, a bunch of housing sizes, and as well as sort of the right answer 529 00:38:51,090 --> 00:38:52,870 of what the actual prices 530 00:38:53,510 --> 00:38:54,560 of a number of houses were, right? 531 00:38:55,690 --> 00:38:59,160 So we call this supervised learning because we're supervising the algorithm or, in other words, 532 00:38:59,610 --> 00:39:02,940 we're giving the algorithm the, quote, right answer 533 00:39:03,440 --> 00:39:04,370 for a number of houses. 534 00:39:04,970 --> 00:39:10,960 And then we want the algorithm to learn the association between the inputs and the outputs and to sort of give us more of the right answers, 535 00:39:11,970 --> 00:39:13,730 okay? 536 00:39:14,770 --> 00:39:20,560 It turns out this specific example that I drew here is an example of something called a regression problem. 537 00:39:20,870 --> 00:39:23,010 And 538 00:39:23,570 --> 00:39:29,250 the term regression sort of refers to the fact that the variable you're trying to predict is a continuous value ... 539 00:39:29,850 --> 00:39:31,730 continuous value and price. 540 00:39:32,420 --> 00:39:34,660 541 00:39:36,340 --> 00:39:36,940 542 00:39:37,710 --> 00:39:43,030 There's another class of supervised learning problems which we'll talk about, which are classification problems. 543 00:39:43,440 --> 00:39:45,200 And so, 544 00:39:49,040 --> 00:39:52,790 in a classification problem, the variable you're trying to predict is 545 00:39:53,410 --> 00:39:57,960 discrete rather than continuous. So as one specific example 546 00:39:58,520 --> 00:39:58,750 547 00:39:59,220 --> 00:40:04,510 - so actually a standard data set you can download online [inaudible] that lots of machine learning people have played with. 548 00:40:04,890 --> 00:40:08,750 Let's say you collect a data set on breast cancer tumors, 549 00:40:09,140 --> 00:40:10,920 and you want to know ... 550 00:40:11,350 --> 00:40:11,840 551 00:40:12,340 --> 00:40:18,600 learn the algorithm to predict whether or not a certain tumor is malignant. 552 00:40:19,070 --> 00:40:21,950 Malignant is the opposite of benign, right, so malignancy is 553 00:40:22,430 --> 00:40:23,430 a sort of harmful, bad tumor. 554 00:40:24,090 --> 00:40:30,430 So we collect some number of features, some number of properties of these tumors, and for the sake of 555 00:40:31,090 --> 00:40:36,880 sort of having a simple [inaudible] explanation, let's just say that we're going to look at 556 00:40:37,490 --> 00:40:42,280 the size of the tumor and depending on the size of the tumor, we'll try to figure out whether or not 557 00:40:42,940 --> 00:40:47,600 the tumor is malignant or benign. So the tumor is either malignant or benign, 558 00:40:48,160 --> 00:40:48,600 and so 559 00:40:49,010 --> 00:40:55,190 the variable in the Y-axis is either zero or 1, and so your data set may look something like ... 560 00:40:56,000 --> 00:41:02,240 may look something like that, right? 561 00:41:03,350 --> 00:41:07,620 And that's 1 and that's zero, okay? 562 00:41:08,870 --> 00:41:13,960 And so this is an example of a classification problem 563 00:41:14,710 --> 00:41:16,690 where 564 00:41:17,500 --> 00:41:17,850 565 00:41:19,350 --> 00:41:26,310 the variable you're trying to predict is a discreet value. It's either zero or 1. And in fact, more generally, 566 00:41:27,150 --> 00:41:30,510 there will be many learning problems where we'll 567 00:41:31,410 --> 00:41:34,640 have more than one input variable, more than one input feature and 568 00:41:35,010 --> 00:41:36,900 use more than one variable to try to predict, 569 00:41:37,260 --> 00:41:40,120 say, whether a tumor is malignant or benign. So, for example, 570 00:41:40,610 --> 00:41:41,130 571 00:41:41,990 --> 00:41:46,070 continuing with this, 572 00:41:46,900 --> 00:41:51,840 you may instead have a data set that looks like this. I'm gonna part this data set in a slightly different way now. 573 00:41:52,500 --> 00:41:54,200 574 00:41:54,670 --> 00:41:55,440 575 00:41:56,170 --> 00:42:03,420 And I'm making this data set look much cleaner than it really is in reality for illustration, 576 00:42:03,990 --> 00:42:07,950 okay? 577 00:42:09,640 --> 00:42:14,730 For example, maybe the crosses indicate malignant tumors 578 00:42:15,600 --> 00:42:17,100 579 00:42:17,620 --> 00:42:21,500 and the "O"s may indicate benign tumors. 580 00:42:22,080 --> 00:42:24,280 And so you may have a data set 581 00:42:25,640 --> 00:42:27,110 582 00:42:28,430 --> 00:42:29,150 comprising 583 00:42:30,030 --> 00:42:34,210 patients of different ages and who have different tumor sizes and 584 00:42:34,820 --> 00:42:41,170 where a cross indicates a malignant tumor, and an "O" indicates a benign tumor. And you may want an algorithm to learn to 585 00:42:41,970 --> 00:42:45,020 predict, given a new patient, whether their tumor is malignant or benign. 586 00:42:47,410 --> 00:42:50,090 So, for example, what a learning algorithm may do is 587 00:42:50,310 --> 00:42:55,390 maybe come in and decide that a straight line like that separates the two classes of tumors really well, 588 00:42:56,260 --> 00:42:58,580 and so if you have a new patient [who's|whose] 589 00:42:59,180 --> 00:43:00,040 590 00:43:00,640 --> 00:43:07,080 age and tumor size fall over there, then the algorithm may predict that the tumor is benign rather than malignant, okay? 591 00:43:08,620 --> 00:43:14,320 So this is just another example of another supervised learning problem 592 00:43:15,260 --> 00:43:17,280 and another classification problem. 593 00:43:18,180 --> 00:43:22,220 And so it turns out that one of the issues we'll talk about later in this class is 594 00:43:22,700 --> 00:43:24,330 in this specific example, 595 00:43:24,950 --> 00:43:25,900 we're going to try to predict 596 00:43:26,520 --> 00:43:31,890 whether a tumor is malignant or benign based on two features or based on two inputs, namely the age 597 00:43:32,490 --> 00:43:38,380 of the patient and the tumor size. It turns out that when you look at a real data set, you find that 598 00:43:38,700 --> 00:43:39,550 learning algorithms 599 00:43:39,990 --> 00:43:44,000 often use other sets of features. In the breast cancer data example, 600 00:43:44,520 --> 00:43:45,110 you also use 601 00:43:45,570 --> 00:43:51,210 properties of the tumors, like clump thickness, uniformity of cell size, uniformity of cell shape, 602 00:43:51,650 --> 00:43:55,470 [inaudible] adhesion and so on, so various other medical properties. 603 00:43:56,040 --> 00:43:57,140 And 604 00:43:57,980 --> 00:44:01,130 one of the most interesting things we'll talk about later this quarter is 605 00:44:01,650 --> 00:44:01,870 606 00:44:02,280 --> 00:44:07,320 what if your data doesn't lie in a two-dimensional or three-dimensional or sort of even a finite dimensional space, 607 00:44:08,090 --> 00:44:09,090 but is it possible 608 00:44:09,790 --> 00:44:11,950 - what if your data actually lies in an infinite dimensional space? 609 00:44:13,080 --> 00:44:14,700 Our plots here are two-dimensional space. 610 00:44:15,620 --> 00:44:20,620 I can't plot you an infinite dimensional space, right? And so it turns out that 611 00:44:21,830 --> 00:44:25,750 one of the most successful classes of machine learning algorithms - some may call support vector machines - 612 00:44:26,410 --> 00:44:29,440 actually takes data and 613 00:44:30,030 --> 00:44:34,760 maps data to an infinite dimensional space and then does classification using not 614 00:44:34,970 --> 00:44:37,380 two features like I've done here, but an infinite number of features. 615 00:44:38,090 --> 00:44:43,080 And that will actually be one of the most fascinating things we talk about when we go deeply into classification algorithms. 616 00:44:44,010 --> 00:44:48,920 And it's actually an interesting question, right, so think about how do you even represent an infinite dimensional vector 617 00:44:49,510 --> 00:44:54,400 in computer memory? You don't have an infinite amount of computers. How do you even represent a point 618 00:44:55,020 --> 00:44:56,270 that lies in an infinite dimensional space? 619 00:44:57,460 --> 00:45:00,250 We'll talk about that when we get to 620 00:45:01,970 --> 00:45:03,930 support vector machines, okay? 621 00:45:04,290 --> 00:45:08,470 So let's see. 622 00:45:17,570 --> 00:45:18,630 623 00:45:19,580 --> 00:45:20,700 [|So the second would be ...] 624 00:45:21,450 --> 00:45:26,860 So that was supervised learning. The second of the four major topics of this class will be 625 00:45:27,290 --> 00:45:29,330 learning theory. 626 00:45:31,400 --> 00:45:32,000 So 627 00:45:32,990 --> 00:45:37,090 I have a friend who teaches math at a different university, not at Stanford, and 628 00:45:37,620 --> 00:45:42,210 when you talk to him about his work and what he's really out to do, 629 00:45:42,640 --> 00:45:44,130 this friend of mine will 630 00:45:44,700 --> 00:45:49,150 - he's a math professor, right? - this friend of mine will sort of get the look of wonder in his eyes, and he'll 631 00:45:49,620 --> 00:45:51,800 tell you about how in his mathematical work, 632 00:45:52,210 --> 00:45:56,760 he feels like he's discovering truth and beauty in the universe. And he says it 633 00:45:57,280 --> 00:46:00,150 in sort of a really touching, sincere way, and then he has this 634 00:46:00,950 --> 00:46:07,900 - you can see it in his eyes - he has this deep appreciation of the truth and beauty in the universe as revealed to him by the math he does. 635 00:46:08,840 --> 00:46:14,280 In this class, I'm not gonna do any truth and beauty. 636 00:46:15,350 --> 00:46:21,730 In this class, I'm gonna talk about learning theory to try to convey to you an understanding of how and why learning algorithms 637 00:46:22,060 --> 00:46:23,000 work 638 00:46:23,510 --> 00:46:28,060 so that we can apply these learning algorithms as effectively as possible. 639 00:46:28,760 --> 00:46:29,000 640 00:46:29,850 --> 00:46:31,580 So, for example, [we will ...] 641 00:46:31,980 --> 00:46:38,110 it turns out you can prove surprisingly deep theorems on when you can guarantee that a learning algorithm will work, all right? 642 00:46:38,490 --> 00:46:40,540 So think about a learning algorithm for reading zip codes. 643 00:46:41,330 --> 00:46:42,710 When can you prove a theorem 644 00:46:43,190 --> 00:46:48,130 guaranteeing that a learning algorithm will be at least 99.9 percent accurate on reading zip codes? 645 00:46:48,500 --> 00:46:50,430 This is actually somewhat surprising. We actually prove theorems 646 00:46:51,070 --> 00:46:53,140 showing when you can expect that to hold. 647 00:46:54,100 --> 00:46:58,770 We'll also sort of delve into learning theory to try to understand what algorithms can 648 00:46:59,050 --> 00:47:01,410 approximate different functions well [and when] 649 00:47:01,880 --> 00:47:08,410 and also try to understand things like how much training data do you need? So how many examples of houses do I need 650 00:47:08,920 --> 00:47:11,580 in order for your learning algorithm to recognize the pattern 651 00:47:12,060 --> 00:47:15,820 between the square footage of a house and its housing price? 652 00:47:16,380 --> 00:47:17,840 And this will help us answer questions like 653 00:47:18,710 --> 00:47:23,040 if you're trying to design a learning algorithm, should you be spending more time collecting more data 654 00:47:23,440 --> 00:47:26,790 or is it a case that you already have enough data; it would be a waste of time to try to collect more. 655 00:47:27,570 --> 00:47:29,840 Okay? So 656 00:47:30,740 --> 00:47:35,630 I think learning algorithms are a very powerful tool that as I 657 00:47:36,160 --> 00:47:41,300 walk around sort of industry in Silicon Valley or as I work with various businesses in CS and outside CS, 658 00:47:42,410 --> 00:47:45,150 I find that there's often a huge difference between 659 00:47:45,710 --> 00:47:47,910 how well someone who really understands this stuff 660 00:47:48,540 --> 00:47:52,820 can apply a learning algorithm versus someone who sort of gets it but sort of doesn't. 661 00:47:54,220 --> 00:47:55,770 The analogy I like to think of is 662 00:47:56,200 --> 00:48:02,220 imagine you were going to a carpentry school instead of a machine learning class, right? If you go to a carpentry school, 663 00:48:02,740 --> 00:48:06,900 they can give you the tools of carpentry. They'll give you a hammer, a bunch of nails, a screwdriver or whatever. 664 00:48:07,580 --> 00:48:12,690 But a master carpenter will be able to use those tools far better than most of us in this room. 665 00:48:13,210 --> 00:48:15,690 I know a carpenter can do things with a hammer and nail that 666 00:48:15,920 --> 00:48:16,450 I couldn't possibly. 667 00:48:17,430 --> 00:48:23,760 And it's actually a little bit like that in machine learning, too. One thing that's sadly not taught in many courses on machine learning is 668 00:48:24,360 --> 00:48:28,550 how to take the tools of machine learning and really, really apply them well. So 669 00:48:29,350 --> 00:48:29,870 in the same way, 670 00:48:30,240 --> 00:48:36,820 so the tools of machine learning are I wanna say quite a bit more advanced than the tools of carpentry. Maybe a carpenter will disagree. But 671 00:48:37,410 --> 00:48:43,160 a large part of this class will be just giving you the raw tools of machine learning, just the algorithms and so on. 672 00:48:43,860 --> 00:48:49,940 But what I plan to do throughout this entire quarter, not just in the segment of learning theory, but actually as 673 00:48:50,460 --> 00:48:54,320 a theme running through everything I do this quarter, will be to try to convey to you 674 00:48:54,960 --> 00:49:00,320 the skills to really take the learning algorithm ideas and really to get them to work on a problem. 675 00:49:01,240 --> 00:49:02,720 It's sort of hard for me 676 00:49:03,420 --> 00:49:03,630 677 00:49:04,230 --> 00:49:09,810 to stand here and say how big a deal that is, but when I walk around companies in Silicon Valley, 678 00:49:10,300 --> 00:49:13,630 it's completely not uncommon for me to see someone using 679 00:49:14,370 --> 00:49:20,390 some machine learning algorithm and then explain to me what they've been doing for the last six months, and go, oh, gee, 680 00:49:21,290 --> 00:49:25,350 it should have been obvious from the start that the last six months, you've been wasting your time, right? 681 00:49:25,890 --> 00:49:27,370 682 00:49:27,910 --> 00:49:33,420 And so my goal in this class, running through the entire quarter, not just on learning theory, is actually 683 00:49:33,790 --> 00:49:35,510 not only to give you the tools of machine learning, 684 00:49:36,050 --> 00:49:38,810 but to teach you how to use them well. And 685 00:49:39,410 --> 00:49:42,690 I've noticed this is something that really not many other classes teach. 686 00:49:43,780 --> 00:49:48,260 And this is something I'm really convinced is a huge deal, and so 687 00:49:48,800 --> 00:49:55,420 by the end of this class, I hope all of you will be master carpenters. I hope all of you will be really good at applying these learning algorithms 688 00:49:56,300 --> 00:49:58,540 and getting them to work amazingly well in many problems. Okay? 689 00:50:01,270 --> 00:50:06,630 Let's see. So [inaudible] the board. 690 00:50:09,230 --> 00:50:11,950 After learning theory, 691 00:50:13,970 --> 00:50:17,320 there's another class of learning algorithms that I then want to 692 00:50:17,900 --> 00:50:21,800 teach you about, and that's unsupervised learning. 693 00:50:24,250 --> 00:50:27,110 694 00:50:27,780 --> 00:50:30,800 695 00:50:31,470 --> 00:50:34,140 So you recall, right, 696 00:50:35,250 --> 00:50:40,910 a little earlier I drew an example like this, right, where you have 697 00:50:41,330 --> 00:50:46,240 a couple of features, a couple of input variables and sort of malignant tumors and benign tumors or whatever. 698 00:50:46,690 --> 00:50:49,280 And that was an example of a supervised learning problem because 699 00:50:49,850 --> 00:50:53,800 the data you have gives you the right answer for each of your patients. 700 00:50:54,130 --> 00:50:57,560 The data tells you this patient has a malignant tumor; this patient has a benign tumor. 701 00:50:58,250 --> 00:51:01,970 So it had the right answers, and you wanted the algorithm to just produce more of the same. 702 00:51:03,110 --> 00:51:08,440 In contrast, in an unsupervised learning problem, 703 00:51:09,070 --> 00:51:10,240 this is the sort of data you get, 704 00:51:10,680 --> 00:51:16,810 okay? 705 00:51:17,700 --> 00:51:18,050 706 00:51:19,210 --> 00:51:20,190 Where 707 00:51:20,600 --> 00:51:28,280 speaking loosely, you're given a data set, and I'm not gonna tell you what the right answer is on any of your data. I'm just gonna give you a data set and I'm gonna say, 708 00:51:28,760 --> 00:51:31,050 "Would you please find interesting structure in this data set?" 709 00:51:31,970 --> 00:51:35,780 So that's the unsupervised learning problem where you're sort of not given the right answer for everything. 710 00:51:36,400 --> 00:51:40,540 So, for example, an algorithm may 711 00:51:42,230 --> 00:51:42,750 find 712 00:51:43,750 --> 00:51:45,600 structure in the data in the form of 713 00:51:46,190 --> 00:51:48,800 the data being partitioned into two clusters, 714 00:51:49,280 --> 00:51:54,440 or clustering is sort of one example of an unsupervised learning problem. 715 00:51:55,340 --> 00:51:57,460 So 716 00:51:58,930 --> 00:51:59,310 I hope you can see this. 717 00:52:00,070 --> 00:52:01,390 It turns out that these sort of 718 00:52:01,670 --> 00:52:04,860 unsupervised learning algorithms are also used in many problems. 719 00:52:05,110 --> 00:52:09,340 This is a screen shot - this is a picture I got from Sue Emvee, who's a PhD student here, 720 00:52:09,870 --> 00:52:13,250 who is applying unsupervised learning algorithms to try to understand gene data, 721 00:52:14,450 --> 00:52:19,380 so is trying to look at genes as individuals and group them into clusters based on ... 722 00:52:19,810 --> 00:52:24,960 based on properties of what genes they respond to - based on properties of how the genes respond to different experiments. 723 00:52:26,150 --> 00:52:29,650 Another interesting application of 724 00:52:30,430 --> 00:52:35,970 [inaudible] sorts of clustering algorithms is actually image processing, this which I got from Steve Gules, who's another PhD student. 725 00:52:36,590 --> 00:52:40,480 It turns out what you can do is if you give this sort of 726 00:52:41,250 --> 00:52:44,770 data, say an image, to certain unsupervised learning algorithms, 727 00:52:45,560 --> 00:52:48,070 they will then learn to group pixels together 728 00:52:48,780 --> 00:52:54,060 and say, gee, this sort of pixel seems to belong together, and that sort of pixel seems to belong together. 729 00:52:54,750 --> 00:52:57,390 And so the images you see on the bottom 730 00:52:57,940 --> 00:52:59,650 - I guess you can just barely see them on there - 731 00:53:01,140 --> 00:53:04,700 so the images you see on the bottom are 732 00:53:05,820 --> 00:53:09,860 groupings - are what the algorithm has done to group certain pixels together. 733 00:53:10,320 --> 00:53:14,050 On a small display, it might be easier to just look at the image on the right. 734 00:53:14,530 --> 00:53:15,680 The two images on the bottom 735 00:53:16,250 --> 00:53:20,860 are two sort of identical visualizations of the same grouping ... 736 00:53:21,200 --> 00:53:22,940 grouping of the pixels 737 00:53:23,510 --> 00:53:24,840 into [inaudible|contiguous] regions. 738 00:53:25,640 --> 00:53:26,060 739 00:53:26,660 --> 00:53:30,820 And so it turns out that this sort of clustering algorithm or this sort of unsupervised learning algorithm, 740 00:53:31,370 --> 00:53:33,300 which learns to group pixels together, 741 00:53:33,820 --> 00:53:38,870 it turns out to be useful for many applications in vision, in computer vision image processing. 742 00:53:39,620 --> 00:53:44,890 I'll just show you one example, and this is a rather cool one that two students, Ashutosh Saxena and Min Sun here did, 743 00:53:45,510 --> 00:53:47,020 which is given an image like this, 744 00:53:47,460 --> 00:53:50,290 right? This is actually a picture taken of the Stanford campus. 745 00:53:50,830 --> 00:53:56,060 You can apply that sort of clustering algorithm and group the picture into regions. 746 00:53:56,800 --> 00:53:57,520 Let me actually 747 00:54:00,510 --> 00:54:02,060 blow that up so that you can see it more clearly. Okay. 748 00:54:20,750 --> 00:54:26,940 So in the middle, you see the lines sort of grouping the image together, grouping the image into [inaudible|contiguous] regions. 749 00:54:27,440 --> 00:54:31,720 And what Ashutosh and Min did was they then applied the learning algorithm to say 750 00:54:32,280 --> 00:54:34,150 can we take this clustering 751 00:54:34,910 --> 00:54:35,460 and use it 752 00:54:36,320 --> 00:54:40,340 to build a 3D model of the world? 753 00:54:41,810 --> 00:54:44,680 And so using the clustering, 754 00:54:47,520 --> 00:54:50,070 they then had a learning algorithm try to learn 755 00:54:51,620 --> 00:54:53,640 what the 3D structure of the world looks like 756 00:54:56,580 --> 00:55:00,520 so that they could come up with a 3D model that you can sort of fly through, okay? 757 00:55:05,180 --> 00:55:10,460 Although many people used to think it's not possible to take a single image and build a 3D model, 758 00:55:13,020 --> 00:55:13,410 but 759 00:55:15,040 --> 00:55:18,270 using a learning algorithm and that sort of clustering algorithm is the first step. 760 00:55:23,730 --> 00:55:24,150 They were able to. 761 00:55:24,700 --> 00:55:30,250 I'll just show you one more example. I like this because it's a picture of Stanford with our beautiful Stanford campus. 762 00:55:31,140 --> 00:55:37,920 So again, taking the same sort of clustering algorithms, taking the same sort of unsupervised learning algorithm, you can group the pixels into different regions. 763 00:55:38,780 --> 00:55:40,770 And using that as a pre-processing step, 764 00:55:42,300 --> 00:55:43,020 they eventually built 765 00:55:44,280 --> 00:55:49,720 this sort of 3D model of Stanford campus in a single picture. You can sort of walk into the ceiling, 766 00:55:51,860 --> 00:55:52,140 look around the campus. Okay? 767 00:55:57,460 --> 00:56:00,160 This actually turned out to be a mix of supervised and unsupervised learning, 768 00:56:00,470 --> 00:56:05,240 but the unsupervised learning, this sort of clustering was the first step. 769 00:56:06,920 --> 00:56:07,650 So it turns out 770 00:56:10,800 --> 00:56:11,020 there ... 771 00:56:11,780 --> 00:56:16,840 these sorts of unsupervised - clustering algorithms are actually routinely used for many different problems, 772 00:56:17,310 --> 00:56:21,110 things like organizing computing clusters, social network analysis, 773 00:56:21,710 --> 00:56:29,760 market segmentation, so if you're a marketer and you want to divide your market into different segments or different groups of people to market to them separately; 774 00:56:30,280 --> 00:56:35,800 even for astronomical data analysis and understanding how galaxies are formed. These are 775 00:56:36,260 --> 00:56:40,990 just a sort of small sample of the applications of unsupervised learning algorithms and clustering algorithms 776 00:56:41,690 --> 00:56:42,890 that we'll talk about later in this class. 777 00:56:43,740 --> 00:56:44,100 778 00:56:44,960 --> 00:56:50,590 Just one particularly cool example of an unsupervised learning algorithm that I want to tell you about. 779 00:56:51,470 --> 00:56:56,840 And to motivate that, I'm gonna tell you about what's called the cocktail party problem, which is 780 00:56:57,470 --> 00:57:01,020 imagine that you're at some cocktail party and there are lots of people standing all over. 781 00:57:01,510 --> 00:57:03,650 And you know how it is, right, if you're at a large party, 782 00:57:04,010 --> 00:57:07,990 everyone's talking, it can be sometimes very hard to hear even the person in front of you. 783 00:57:08,330 --> 00:57:11,020 So imagine a large cocktail party with lots of people. 784 00:57:11,500 --> 00:57:15,270 So the problem is, is that all of these people talking, can you 785 00:57:15,680 --> 00:57:19,640 separate out the voice of just the person you're interested in talking to 786 00:57:20,260 --> 00:57:21,440 with all this loud background noise? 787 00:57:22,580 --> 00:57:23,710 So I'll show you 788 00:57:24,230 --> 00:57:28,940 I'll show you a specific example in a second, but here's a cocktail party 789 00:57:29,760 --> 00:57:33,710 that's I guess rather sparsely attended by just two people. But 790 00:57:34,830 --> 00:57:35,110 791 00:57:36,580 --> 00:57:39,690 what we're gonna do is we'll put two microphones in the room, 792 00:57:40,430 --> 00:57:43,940 okay? And so because the microphones 793 00:57:44,670 --> 00:57:49,520 are just at slightly different distances to the two people, and the two people may speak in slightly different volumes, 794 00:57:50,010 --> 00:57:50,820 each microphone 795 00:57:51,170 --> 00:57:55,280 will pick up an overlapping combination of these two people's voices, 796 00:57:55,960 --> 00:57:58,720 so slightly different overlapping voices. 797 00:57:59,170 --> 00:58:02,850 So Speaker 1's voice may be more loud on Microphone 1, and Speaker 2's 798 00:58:03,400 --> 00:58:05,150 voice may be louder on Microphone 2, whatever. 799 00:58:05,710 --> 00:58:10,440 But the question is, given these microphone recordings, can you separate out the original speaker's voices? 800 00:58:11,820 --> 00:58:12,080 So 801 00:58:13,300 --> 00:58:19,270 I'm gonna play some audio clips that were collected by Tai Yuan Lee at UCSD. 802 00:58:19,710 --> 00:58:20,850 Let me show ... 803 00:58:21,600 --> 00:58:28,030 I'm gonna actually play for you the original raw microphone recordings from this cocktail party. So this is the 804 00:58:29,070 --> 00:58:29,860 Microphone 1: 805 00:58:30,880 --> 00:58:34,920 *Microphone 1: *One, two, three, four, five, six,... *Microphone 2: *Uno, dos, tres, cuatro, cinco, seis, 806 00:58:35,380 --> 00:58:41,840 *Microphone 1: seven, eight, nine, ten. *Microphone 2: siete, ocho, nueve, diez. *Instructor (Andrew Ng) : *So it's a fascinating cocktail party with people counting from one to ten. 807 00:58:43,030 --> 00:58:44,930 This is the second microphone: *Microphone 1: *One, two, *Microphone 2: *Uno, dos, 808 00:58:45,250 --> 00:58:50,750 *Microphone 1: three, four, five, six, seven, eight, nine, ten. *Microphone 2: tres, cuatro, cinco, seis, siete, ocho, nueve, diez. 809 00:58:52,570 --> 00:59:00,180 *Instructor (Andrew Ng) : *Okay. So in supervised learning, we don't know what the right answer is, right? So what we're going to do is take exactly the two microphone recordings you just heard 810 00:59:00,850 --> 00:59:05,110 and give it to an unsupervised learning algorithm and tell the algorithm 811 00:59:06,060 --> 00:59:09,680 which of these discover structure in the data [inaudible] or what structure is there in this data? 812 00:59:10,550 --> 00:59:17,390 And we actually don't know what the right answer is offhand. So give this data to an unsupervised learning algorithm, 813 00:59:18,240 --> 00:59:20,800 and what the algorithm does in this case, it will discover that 814 00:59:21,400 --> 00:59:26,880 this data can actually be explained by two independent speakers speaking at the same time, 815 00:59:27,580 --> 00:59:28,420 and it can further 816 00:59:28,950 --> 00:59:35,170 separate out the two speakers for you. So here's Output 1 of the algorithm: *Microphone 1: One, two, three, four, 817 00:59:35,520 --> 00:59:36,650 five, six, 818 00:59:37,020 --> 00:59:40,220 seven, eight, nine, ten. 819 00:59:40,630 --> 00:59:45,480 *Instructor (Andrew Ng) : *And there's the second algorithm: *Microphone 2: Uno, dos, tres, cuatro, cinco, seis, 820 00:59:45,940 --> 00:59:48,480 siete, ocho, nueve, diez. 821 00:59:49,090 --> 00:59:51,000 *Instructor (Andrew Ng): *And so the algorithm discovers that, 822 00:59:51,490 --> 00:59:56,580 gee, the structure underlying the data is really that there are two sources of sound, and here they are. 823 00:59:57,540 --> 00:59:59,890 I'll show you one more example. This is a, 824 01:00:00,800 --> 01:00:03,320 well, this is a second sort of 825 01:00:04,040 --> 01:00:12,920 different pair of microphone recordings: *Microphone 1: *One, two, three, four, five, six, seven, eight, nine, ten. *Microphone 2: *[Music playing.] 826 01:00:13,290 --> 01:00:15,940 *Instructor (Andrew Ng): *So the poor guy is not at a cocktail party. He's talking to his radio. 827 01:00:16,500 --> 01:00:20,560 There's the second recording: *Microphone 1: *One, two, three, four, five, *Microphone 2: *[Music playing.] 828 01:00:20,890 --> 01:00:24,610 *Microphone 1: *six, seven, eight, nine, ten. *Microphone 2: *[Music playing.] 829 01:00:24,880 --> 01:00:28,510 *Instructor (Andrew Ng) : *Right. And we get this data. It's the same unsupervised learning algorithm. 830 01:00:28,800 --> 01:00:33,270 The algorithm is actually called independent component analysis, and later in this quarter, you'll see why. 831 01:00:33,900 --> 01:00:35,700 And then output's the following: 832 01:00:36,140 --> 01:00:41,600 *Microphone 1: * One, *two, three, four, five, six, seven, eight, nine, ten. 833 01:00:42,410 --> 01:00:50,060 *Instructor (Andrew Ng): *And that's the second one: *Microphone 2: *[Music playing.] 834 01:00:51,560 --> 01:00:56,400 *Instructor (Andrew Ng): *Okay. So it turns out that beyond solving the cocktail party algorithm, 835 01:00:56,800 --> 01:01:02,430 this specific class of unsupervised learning algorithms are also applied to a bunch of other problems, 836 01:01:02,820 --> 01:01:03,880 like in text processing or 837 01:01:04,230 --> 01:01:10,830 understanding functional grading and machine data, like the magneto-encephalogram would be an EEG data. We'll talk about that more 838 01:01:11,300 --> 01:01:11,820 when we go and 839 01:01:12,310 --> 01:01:16,730 describe ICA or independent component analysis algorithms, which is what you just saw. 840 01:01:17,330 --> 01:01:17,640 841 01:01:18,420 --> 01:01:20,490 And as an aside, 842 01:01:21,620 --> 01:01:23,630 this algorithm I just showed you, it seems like it must be 843 01:01:23,920 --> 01:01:29,450 a pretty complicated algorithm, right, to take this overlapping audio streams and separate them out. It sounds like a pretty complicated thing to do. 844 01:01:30,330 --> 01:01:30,760 So you're gonna ask 845 01:01:31,030 --> 01:01:33,850 how complicated is it really to implement an algorithm like this? 846 01:01:34,870 --> 01:01:35,700 It turns out 847 01:01:37,280 --> 01:01:40,300 if you do it in MATLAB, you can do it in one line of code. 848 01:01:41,370 --> 01:01:45,500 So I got this from Samuel Wyse at Toronto, U of Toronto, and 849 01:01:46,260 --> 01:01:49,990 the example I showed you actually used a more complicated ICA algorithm than this. 850 01:01:50,290 --> 01:01:52,370 But nonetheless, I guess this is why 851 01:01:53,050 --> 01:01:58,080 for this class I'm going to ask you to do most of your programming in MATLAB and Octave 852 01:01:58,750 --> 01:02:03,210 because if you try to implement the same algorithm in C or Java or something, 853 01:02:04,220 --> 01:02:05,390 I can tell you from personal, 854 01:02:05,800 --> 01:02:10,570 painful experience, you end up writing pages and pages of code rather than relatively few lines of code. 855 01:02:11,810 --> 01:02:17,330 I'll also mention that it did take researchers many, many years to come up with that one line of code, 856 01:02:17,850 --> 01:02:19,580 so this is not easy. 857 01:02:20,290 --> 01:02:21,260 858 01:02:23,800 --> 01:02:29,370 So that was unsupervised learning, and then the last of the four major topics I wanna tell you about is 859 01:02:29,750 --> 01:02:31,470 reinforcement learning. 860 01:02:32,130 --> 01:02:38,700 And this refers to problems where you don't do one-shot decision-making. So, for example, 861 01:02:39,230 --> 01:02:42,390 in the supervised learning cancer prediction problem, 862 01:02:42,910 --> 01:02:44,530 you have a patient come in, 863 01:02:44,910 --> 01:02:51,250 you predict that the cancer is malignant or benign. And then based on your prediction, maybe the patient lives or dies, and then that's it, right? 864 01:02:51,600 --> 01:02:55,120 So you make a decision and then there's a consequence. You either got it right or wrong. 865 01:02:55,710 --> 01:03:01,850 In reinforcement learning problems, you are usually asked to make a sequence of decisions over time. 866 01:03:02,920 --> 01:03:03,670 So, for example, 867 01:03:04,460 --> 01:03:07,170 this is something that my students and I work on. 868 01:03:07,660 --> 01:03:14,430 If I give you the keys to an autonomous helicopter - we actually have this helicopter here at Stanford, - 869 01:03:14,910 --> 01:03:16,580 how do you write a program to make it fly, right? 870 01:03:17,220 --> 01:03:20,310 You notice that if you make a wrong decision on a helicopter, 871 01:03:20,760 --> 01:03:21,050 [you know,] 872 01:03:21,350 --> 01:03:27,810 the consequence of crashing it may not happen until much later. And in fact, usually you need to make a whole sequence of bad decisions to crash a helicopter. 873 01:03:28,650 --> 01:03:33,910 But conversely, you also need to make a whole sequence of good decisions in order to fly a helicopter really well. 874 01:03:34,920 --> 01:03:35,970 So 875 01:03:36,890 --> 01:03:41,480 I'm gonna show you some fun videos of learning algorithms flying helicopters. 876 01:03:42,110 --> 01:03:46,160 This is a video of our helicopter at Stanford flying 877 01:03:46,590 --> 01:03:49,720 using a controller that was learned using a reinforcement learning algorithm. 878 01:03:50,610 --> 01:03:52,620 879 01:03:52,830 --> 01:03:56,840 So this was done on the Stanford football field, and we'll zoom out the camera in a second. 880 01:03:57,230 --> 01:03:59,870 You'll sort of see the trees planted in the sky. 881 01:04:00,730 --> 01:04:02,830 So 882 01:04:04,470 --> 01:04:05,370 883 01:04:05,760 --> 01:04:12,560 maybe this is one of the most difficult aerobatic maneuvers flown on any helicopter 884 01:04:13,030 --> 01:04:14,160 under computer control. 885 01:04:14,620 --> 01:04:19,540 And this controller, which is very, very hard for a human to sit down and write out, 886 01:04:19,950 --> 01:04:22,370 was learned using one of these reinforcement learning algorithms. 887 01:04:23,020 --> 01:04:23,270 888 01:04:24,050 --> 01:04:29,880 Just a word about that: The basic idea behind a reinforcement learning algorithm is this idea of what's called a reward function. 889 01:04:30,780 --> 01:04:34,120 What we have to think about is imagine you're trying to train a dog. 890 01:04:34,780 --> 01:04:39,530 So every time your dog does something good, you say, "Good dog," and you reward the dog. 891 01:04:40,460 --> 01:04:42,900 Every time your dog does something bad, you go, "Bad dog," right? 892 01:04:43,950 --> 01:04:49,730 And hopefully, over time, your dog will learn to do the right things to get more of the positive rewards, to get more of 893 01:04:50,130 --> 01:04:52,080 the "Good dogs" and to get fewer of the "Bad dogs. 894 01:04:53,150 --> 01:04:59,480 - So the way we teach a helicopter to fly or any of these robots is sort of the same thing. Every time the helicopter crashes, we go, "Bad helicopter," 895 01:04:59,990 --> 01:05:00,210 and 896 01:05:00,690 --> 01:05:05,730 every time it does the right thing, we go, "Good helicopter," and over time it learns how to control itself 897 01:05:05,990 --> 01:05:08,320 so as to get more of these positive rewards. 898 01:05:08,730 --> 01:05:10,990 So reinforcement learning is - I think of it as a way 899 01:05:11,460 --> 01:05:14,610 for you to specify what you want done, so you have to specify 900 01:05:15,170 --> 01:05:17,540 what is a "good dog" and what is a "bad dog" behavior. 901 01:05:18,140 --> 01:05:20,220 And then it's up to the learning algorithm to figure out 902 01:05:20,660 --> 01:05:25,800 how to maximize the "good dog" reward signals and minimize the "bad dog" punishments. 903 01:05:26,500 --> 01:05:26,910 904 01:05:28,090 --> 01:05:29,570 [Just to show you] 905 01:05:30,210 --> 01:05:34,250 So it turns out reinforcement learning is applied to other problems in robotics. 906 01:05:34,680 --> 01:05:39,860 It's applied to things in web crawling and so on. But it's just cool to show videos, so let me just show a bunch of them. 907 01:05:40,480 --> 01:05:42,630 [This robot was actually...] 908 01:05:43,560 --> 01:05:47,220 This learning algorithm was actually implemented by our head TA, Zico, 909 01:05:48,100 --> 01:05:53,550 of programming a four-legged dog. I guess Sam Shriver in this class also worked on the project 910 01:05:53,800 --> 01:05:55,070 and Peter Renfrew and 911 01:05:55,660 --> 01:05:58,520 Mike and a few others. 912 01:06:01,180 --> 01:06:05,120 But I guess this really is a good dog/bad dog since it's a robot dog. 913 01:06:05,940 --> 01:06:06,820 914 01:06:07,560 --> 01:06:09,030 The second video on the right, 915 01:06:09,670 --> 01:06:12,540 some of the students, I guess Peter, Zico, Tonca 916 01:06:13,050 --> 01:06:18,900 working on a robotic snake, again using learning algorithms to teach a snake robot to climb over obstacles. 917 01:06:19,640 --> 01:06:23,320 Below that, this is kind of a fun example. 918 01:06:24,170 --> 01:06:25,800 Ashutosh Saxena and Jeff Michaels 919 01:06:26,310 --> 01:06:30,820 used learning algorithms to teach a car how to drive at reasonably high speeds off roads 920 01:06:31,590 --> 01:06:33,520 avoiding obstacles. 921 01:06:36,040 --> 01:06:37,170 922 01:06:38,010 --> 01:06:42,710 And on the lower right, that's a robot programmed by PhD student Eva Roshen 923 01:06:43,450 --> 01:06:48,030 to teach a sort of somewhat strangely configured robot 924 01:06:48,730 --> 01:06:51,080 how to get on top of an obstacle, how to get over an obstacle. 925 01:06:52,500 --> 01:06:54,600 Sorry. I know the video's kind of small. I hope you can sort of see it. 926 01:06:56,570 --> 01:06:59,400 Okay? So I think all of these are robots that 927 01:07:00,280 --> 01:07:05,600 I think are very difficult to hand-code a controller for by [learning|using] these sorts of learning algorithms. 928 01:07:06,190 --> 01:07:08,630 You can in relatively short order 929 01:07:09,360 --> 01:07:10,070 930 01:07:11,200 --> 01:07:13,330 get a robot to do often pretty amazing things. 931 01:07:24,250 --> 01:07:26,810 Okay. So 932 01:07:28,240 --> 01:07:33,260 that was most of what I wanted to say today. Just a couple more last things, 933 01:07:33,780 --> 01:07:35,540 but let me just check what questions you have right now. 934 01:07:38,310 --> 01:07:41,230 935 01:07:44,100 --> 01:07:45,570 936 01:07:45,980 --> 01:07:51,650 So if there are no questions, I'll just close with two reminders, which are 937 01:07:53,040 --> 01:07:57,330 after class today or as you start to talk with other people in this class, 938 01:07:57,770 --> 01:08:02,840 I just encourage you again to start to form project partners, to try to find project partners 939 01:08:03,320 --> 01:08:04,250 to do your project with. 940 01:08:04,590 --> 01:08:10,730 And also, this is a good time to start forming study groups, so either talk to your friends or post in the newsgroup, 941 01:08:11,210 --> 01:08:12,530 but we just encourage you to 942 01:08:13,120 --> 01:08:18,450 try to start to do both of those today, okay? Form study groups, and try to find two other project partners. 943 01:08:19,180 --> 01:08:19,800 944 01:08:20,260 --> 01:08:24,450 So thank you. I'm looking forward to teaching this class, and I'll see you in a couple of days.