1
00:00:00,850 --> 00:00:09,140
[Music]

2
00:00:11,030 --> 00:00:15,030
This presentation is delivered by the Stanford Center for Professional Development.

3
00:00:23,220 --> 00:00:28,420
Okay. Good morning. Welcome to CS229, the machine learning class. 

4
00:00:29,070 --> 00:00:33,800
So what I wanna do today is just spend a little time going over the logistics of the class, and then 

5
00:00:34,530 --> 00:00:36,550
we'll start to talk a bit about machine learning.

6
00:00:38,310 --> 00:00:42,770
By way of introduction, my name's Andrew Ng and I'll be instructor for this class. And 

7
00:00:43,120 --> 00:00:46,860
so I personally work in machine learning, and I've worked on it for about 15 years now, 

8
00:00:47,300 --> 00:00:47,530
and 

9
00:00:48,010 --> 00:00:55,500
I actually think that machine learning is the most exciting field of all the computer sciences. So I'm actually always excited about teaching this class. 

10
00:00:56,890 --> 00:00:58,440
Sometimes I actually think that machine learning 

11
00:00:59,230 --> 00:01:03,740
is not only the most exciting thing in computer science, but the most exciting thing in all of human endeavor, so 

12
00:01:04,630 --> 00:01:05,520
maybe a little bias there. 

13
00:01:06,330 --> 00:01:08,460
I also want to introduce the TAs, 

14
00:01:09,010 --> 00:01:13,990
who are all graduate students doing research in or related to the machine learning and all aspects of machine learning. 

15
00:01:14,710 --> 00:01:16,820
Paul Baumstarck 

16
00:01:17,570 --> 00:01:19,130
works in machine learning and computer vision. 

17
00:01:19,890 --> 00:01:26,300
Catie Chang is actually a neuroscientist who applies machine learning algorithms to try to understand the human brain. 

18
00:01:26,960 --> 00:01:30,820
Tom Do is another PhD student, works in computational biology

19
00:01:31,620 --> 00:01:37,710
and in sort of the basic fundamentals of machine learning. Zico Kolter is the head TA, 

20
00:01:38,280 --> 00:01:39,420
he's head TA two years in a row now 

21
00:01:40,070 --> 00:01:45,880
he works also in machine learning and applies them to a bunch of robots. And Daniel Ramage is 

22
00:01:46,500 --> 00:01:50,730
I guess he's not here, Daniel applies learning algorithms to problems in natural language processing.

23
00:01:52,050 --> 00:01:52,270
So 

24
00:01:53,120 --> 00:01:57,250
you'll get to know the TAs and me much better throughout this quarter, but just from 

25
00:01:57,900 --> 00:02:05,020
the sorts of things the TA's do, I hope you can already tell that machine learning is a highly interdisciplinary 

26
00:02:06,050 --> 00:02:09,210
topic in which just the TAs find learning algorithms 

27
00:02:09,460 --> 00:02:13,220
to problems in computer vision and biology and robots and language. 

28
00:02:13,750 --> 00:02:18,140
And machine learning is one of those things that has and is having 

29
00:02:18,610 --> 00:02:20,590
a large impact on many applications.

30
00:02:21,390 --> 00:02:28,180
So just in my own daily work, I actually frequently end up talking to people like helicopter pilots to biologists 

31
00:02:28,710 --> 00:02:30,680
to people in computer systems or databases 

32
00:02:31,110 --> 00:02:33,220
to economists and sort of 

33
00:02:33,720 --> 00:02:40,000
also an unending stream of people from industry coming to Stanford 

34
00:02:40,830 --> 00:02:43,910
interested in applying machine learning methods to their own problems.

35
00:02:44,860 --> 00:02:45,980
So 

36
00:02:47,380 --> 00:02:53,150
yeah, this is fun. A couple of weeks ago, a student actually forwarded to me an article in "Computer World" about 

37
00:02:53,660 --> 00:02:55,310
the 12 IT skills 

38
00:02:55,990 --> 00:03:00,900
that employers can't say no to. So it's about sort of the 12 most 

39
00:03:01,380 --> 00:03:05,230
desirable skills in all of IT and all of information technology, 

40
00:03:05,730 --> 00:03:07,690
and topping the list was actually machine learning. 

41
00:03:08,210 --> 00:03:09,070
So I think this is a 

42
00:03:09,640 --> 00:03:16,040
good time to be learning this stuff and learning algorithms and having a large impact on many segments of science and industry. 

43
00:03:17,210 --> 00:03:19,990
I'm actually curious about something. 

44
00:03:21,370 --> 00:03:23,920
Learning algorithms is one of the things that touches many areas 

45
00:03:24,880 --> 00:03:29,820
of science and industries, and I'm just kind of curious. How many people here are computer science majors, are in the computer science department? 

46
00:03:31,320 --> 00:03:33,050
Okay. About half of you. How many people are from EE? 

47
00:03:35,770 --> 00:03:39,290
Oh, okay, maybe about a fifth. How many biologers are there here? 

48
00:03:41,380 --> 00:03:44,640
Wow, just a few, not many. I'm surprised. Anyone from statistics? 

49
00:03:45,980 --> 00:03:48,630
Okay, a few. So where are the rest of you from?

50
00:03:51,530 --> 00:03:53,250
Say again? iCME. Cool. 

51
00:03:53,640 --> 00:03:56,440
Civi and what else? 

52
00:03:58,770 --> 00:03:59,370
Synthesis, [inaudible] systems. Yeah, cool. 

53
00:04:01,230 --> 00:04:07,410
Chemi. Cool. Aero/astro. Yes, right. Yeah, okay, cool. Anyone else? 

54
00:04:09,580 --> 00:04:15,310
Pardon? MSNE. All right. Cool. Yeah. Pardon? 

55
00:04:18,440 --> 00:04:20,750
Oh, I see, industry. Okay. Cool. Great, great. 

56
00:04:21,700 --> 00:04:25,270
So as you can tell from a cross-section of this class, I think we're a very 

57
00:04:25,860 --> 00:04:31,790
diverse audience in this room, and that's one of the things that makes this class fun to teach and fun to be in, I think.

58
00:04:33,400 --> 00:04:40,450
So in this class, we've tried to convey to you a broad set of principles and tools that will be useful for doing many, many things. 

59
00:04:41,120 --> 00:04:41,970
And 

60
00:04:42,870 --> 00:04:49,190
every time I teach this class, I can actually very confidently say that after December, no matter what you're going to do after 

61
00:04:49,660 --> 00:04:50,300
this December 

62
00:04:51,400 --> 00:04:55,750
when you've sort of completed this class, you'll find the things you learn in this class very useful, 

63
00:04:57,330 --> 00:05:00,240
and these things will be useful pretty much no matter what you end up doing later in your life.

64
00:05:02,520 --> 00:05:02,840


65
00:05:03,840 --> 00:05:10,470
So I have more logistics to go over later, but let's say a few more words about machine learning. 

66
00:05:11,530 --> 00:05:16,790
I feel that machine learning grew out of early work in AI, early work in artificial intelligence. 

67
00:05:17,510 --> 00:05:24,060
And over the last -- I wanna say last 15 or last 20 years or so, it's been viewed as a sort of 

68
00:05:24,760 --> 00:05:28,320
growing new capability for computers. 

69
00:05:28,970 --> 00:05:33,620
And in particular, it turns out that there are many programs or there are many applications that you can't program by hand.

70
00:05:34,330 --> 00:05:40,820
For example, if you want to get a computer to read handwritten characters, to read sort of handwritten digits, 

71
00:05:42,210 --> 00:05:46,320
that actually turns out to be amazingly difficult to write a piece of software 

72
00:05:47,190 --> 00:05:50,870
to take this input, an image of something that I wrote and to figure out 

73
00:05:51,540 --> 00:05:55,920
just what it is, to translate my cursive handwriting into -- 

74
00:05:57,090 --> 00:06:00,810
to extract the characters I wrote out in longhand. 

75
00:06:02,720 --> 00:06:03,670
And other things: 

76
00:06:04,310 --> 00:06:08,310
One thing that my students and I do is autonomous flight. It turns out to be extremely difficult 

77
00:06:08,730 --> 00:06:11,810
to sit down and write a program to fly a helicopter.

78
00:06:12,650 --> 00:06:16,430
But in contrast, if you want to do things like 

79
00:06:17,310 --> 00:06:21,790
to get software to fly a helicopter or have software recognize handwritten digits, 

80
00:06:22,820 --> 00:06:29,130
one very successful approach is to use a learning algorithm and have a computer learn by itself how to, say, recognize your handwriting. 

81
00:06:30,210 --> 00:06:32,060
And in fact, handwritten digit recognition, 

82
00:06:32,580 --> 00:06:37,730
this is pretty much the only approach that works well. It uses applications that are hard to program by hand.

83
00:06:38,770 --> 00:06:41,540
Learning algorithms has also made 

84
00:06:42,630 --> 00:06:46,540
I guess significant inroads in what's sometimes called database mining. 

85
00:06:47,410 --> 00:06:54,440
So, for example, with the growth of IT and computers, increasingly many hospitals are keeping around medical records 

86
00:06:55,440 --> 00:06:59,790
of what sort of patients, what problems they had, what their prognoses was, what the outcome was. 

87
00:07:00,500 --> 00:07:06,570
And taking all of these medical records, which started to be digitized only about maybe 15 years ago,

88
00:07:07,130 --> 00:07:11,450
applying learning algorithms to them can turn raw medical records into

89
00:07:12,050 --> 00:07:18,080
what I might loosely call medical knowledge in which we start to detect trends in medical practice and even start to alter medical practice 

90
00:07:18,870 --> 00:07:23,510
as a result of medical knowledge that's derived by applying learning algorithms 

91
00:07:24,300 --> 00:07:30,200
by applying learning algorithms to the sorts of medical records that hospitals have just been building over the last 

92
00:07:31,090 --> 00:07:32,750
15, 20 years in an electronic format.

93
00:07:34,050 --> 00:07:34,960


94
00:07:35,770 --> 00:07:42,650
Turns out that most of you probably use learning algorithms -- I don't know -- I think half a dozen times a day or maybe a dozen times a day or more, 

95
00:07:43,340 --> 00:07:45,250
and often without knowing it. 

96
00:07:46,380 --> 00:07:49,890
So, for example, every time you send mail via the US Postal System, 

97
00:07:51,640 --> 00:07:58,210
turns out there's an algorithm that tries to automatically read the zip code you wrote on your envelope, 

98
00:07:59,050 --> 00:08:00,270
and that's done by a learning algorithm. 

99
00:08:01,380 --> 00:08:06,780
So every time you send US mail, you are using a learning algorithm, perhaps without even being aware of it. 

100
00:08:07,990 --> 00:08:14,010
Similarly, every time you write a check, I actually don't know the number for this, but a significant fraction of checks that you write 

101
00:08:14,830 --> 00:08:21,150
are processed by a learning algorithm that's learned to read the digits, so the dollar amount that you wrote down on your check. 

102
00:08:21,870 --> 00:08:26,390
So every time you write a check, there's another learning algorithm that you're probably using without even being aware of it.

103
00:08:27,400 --> 00:08:32,490
If you use a credit card, or I know at least one phone company was doing this, 

104
00:08:32,810 --> 00:08:36,150
and lots of companies like eBay as well that do electronic transactions, 

105
00:08:37,310 --> 00:08:39,600
there's a good chance that there's a learning algorithm in the background 

106
00:08:40,080 --> 00:08:45,020
trying to figure out if, say, your credit card's been stolen or if someone's engaging in a fraudulent transaction.

107
00:08:46,190 --> 00:08:51,780
If you use a website like Amazon or Netflix that will often recommend

108
00:08:52,320 --> 00:08:54,840
books for you to buy or movies for you to rent or whatever, 

109
00:08:55,410 --> 00:09:01,610
these are other examples of learning algorithms that have learned what sorts of things you like to buy or what sorts of movies you like to watch 

110
00:09:02,610 --> 00:09:04,500
and can therefore give customized recommendations to you.

111
00:09:06,870 --> 00:09:12,550
Just about a week ago, I had my car serviced, and even there, my car mechanic was trying to explain to me 

112
00:09:13,050 --> 00:09:18,100
some learning algorithm in the innards of my car that's sort of doing its best to optimize my driving performance for fuel efficiency or something.

113
00:09:20,450 --> 00:09:27,780
So, see, most of us use learning algorithms half a dozen, a dozen, maybe dozens of times without even knowing it.

114
00:09:28,420 --> 00:09:33,220
And of course, learning algorithms are also doing things like giving us a growing understanding of the human genome. 

115
00:09:34,090 --> 00:09:38,880
So if someday we ever find a cure for cancer, I bet learning algorithms will have had a large role in that. 

116
00:09:39,880 --> 00:09:41,150
That's sort of the thing that Tom works on, yes?

117
00:09:43,130 --> 00:09:43,880


118
00:09:44,870 --> 00:09:48,870
So in teaching this class, I sort of have three goals. 

119
00:09:49,530 --> 00:09:54,330
One of them is just to I hope convey some of my own excitement about machine learning to you.

120
00:09:55,570 --> 00:09:57,300
The second goal is 

121
00:09:58,040 --> 00:10:04,060
by the end of this class, I hope all of you will be able to apply state-of-the-art machine learning algorithms 

122
00:10:05,930 --> 00:10:07,100
to whatever problems you're interested in. 

123
00:10:08,370 --> 00:10:13,310
And if you ever need to build a system for reading zip codes, you'll know how to do that by the end of this class.

124
00:10:14,960 --> 00:10:16,370
And lastly, 

125
00:10:17,400 --> 00:10:23,380
by the end of this class, I realize that only a subset of you are interested in doing research in machine learning, 

126
00:10:24,040 --> 00:10:30,040
but by the conclusion of this class, I hope that all of you will actually be well qualified to start doing research in machine learning, okay?

127
00:10:32,820 --> 00:10:33,040


128
00:10:34,470 --> 00:10:40,350
So let's say a few words about logistics. The prerequisites of this class are written on 

129
00:10:40,900 --> 00:10:41,560
one of the handouts, 

130
00:10:42,330 --> 00:10:43,580
are as follows: 

131
00:10:44,930 --> 00:10:47,050
In this class, I'm going to assume that all of you have 

132
00:10:47,630 --> 00:10:48,230
sort of basic 

133
00:10:48,760 --> 00:10:54,690
knowledge of computer science and knowledge of the basic computer skills and principles. 

134
00:10:55,750 --> 00:11:01,850
So I assume all of you know what big-O notation, that all of you know about sort of data structures like queues, stacks, binary trees, 

135
00:11:02,360 --> 00:11:04,810
and that all of you know enough programming skills to, like, 

136
00:11:05,560 --> 00:11:06,570
write a simple computer program. 

137
00:11:07,500 --> 00:11:11,610
And it turns out that most of this class will not be very programming intensive, 

138
00:11:12,320 --> 00:11:17,150
although we will do some programming, mostly in either MATLAB or Octave. I'll say a bit more about that later.

139
00:11:18,280 --> 00:11:22,710
I also assume familiarity with basic probability and statistics. 

140
00:11:23,610 --> 00:11:28,340
So most undergraduate statistics class, like Stat 116 taught here at Stanford,

141
00:11:28,970 --> 00:11:29,510
will be more than enough. 

142
00:11:30,050 --> 00:11:36,370
I'm gonna assume all of you know what random variables are, that all of you know what expectation is, what a variance or a random variable is. 

143
00:11:37,890 --> 00:11:43,860
And in case of some of you, it's been a while since you've seen some of this material. At some of the discussion sections, 

144
00:11:44,300 --> 00:11:47,070
we'll actually go over some of the prerequisites, 

145
00:11:47,580 --> 00:11:50,290
sort of as a refresher course under prerequisite class. 

146
00:11:51,020 --> 00:11:53,850
I'll say a bit more about that later as well.

147
00:11:54,890 --> 00:11:59,580
Lastly, I also assume familiarity with basic linear algebra. And again, 

148
00:12:00,130 --> 00:12:02,960
most undergraduate linear algebra courses are more than enough. 

149
00:12:03,490 --> 00:12:06,640
So if you've taken courses like Math 51, 

150
00:12:07,110 --> 00:12:11,710
103, Math 113 or CS205 at Stanford, that would be more than enough. 

151
00:12:12,490 --> 00:12:17,750
Basically, I'm gonna assume that all of you know what matrices and vectors are, that you know how to multiply matrices 

152
00:12:18,220 --> 00:12:19,910
and vectors and multiply matrix and matrices, 

153
00:12:20,420 --> 00:12:21,980
that you know what a matrix inverse is. 

154
00:12:22,580 --> 00:12:26,310
If you know what an eigenvector of a matrix is, that'd be even better. But if you don't 

155
00:12:27,240 --> 00:12:29,940
quite know or if you're not quite sure, that's fine, too. We'll go over it in the review sections.

156
00:12:32,960 --> 00:12:35,130
So 

157
00:12:40,500 --> 00:12:40,720


158
00:12:41,270 --> 00:12:46,100
there are a couple more logistical things I should deal with in this class. 

159
00:12:48,160 --> 00:12:48,750
One is that, 

160
00:12:49,150 --> 00:12:55,060
as most of you know, CS229 is a televised class. And in fact, I guess many of you are probably 

161
00:12:55,470 --> 00:12:59,200
watching this at home on TV, so I'm gonna say hi to our home viewers.

162
00:13:00,360 --> 00:13:00,570


163
00:13:01,680 --> 00:13:04,310
So earlier this year, I approached SCPD, 

164
00:13:04,920 --> 00:13:10,350
which televises these classes, about trying to make a small number of Stanford classes publicly available 

165
00:13:10,930 --> 00:13:13,630
or posting the videos on the web. And 

166
00:13:14,340 --> 00:13:17,610
so this year, Stanford is actually starting a small pilot program 

167
00:13:18,100 --> 00:13:22,050
in which we'll post videos of a small number of classes online, so on the Internet 

168
00:13:22,670 --> 00:13:26,780
in a way that makes it publicly accessible to everyone. I'm very excited about that 

169
00:13:27,990 --> 00:13:28,320
because machine learning in school, 

170
00:13:29,250 --> 00:13:29,900
let's get the word out there.

171
00:13:30,730 --> 00:13:34,060
One of the consequences of this is that -let's see- 

172
00:13:34,510 --> 00:13:39,020
so videos or pictures of the students in this classroom will not be posted online, 

173
00:13:39,570 --> 00:13:41,260
so your images - so don't worry about 

174
00:13:42,460 --> 00:13:45,510
being by seeing your own face appear on YouTube one day. 

175
00:13:46,310 --> 00:13:50,430
But the microphones may pick up your voices, 

176
00:13:51,120 --> 00:13:54,080
so I guess the consequence of that is that 

177
00:13:54,630 --> 00:14:00,650
because microphones may pick up your voices, no matter how irritated you are at me, don't yell out swear words in the middle of class, 

178
00:14:01,220 --> 00:14:01,440


179
00:14:02,040 --> 00:14:06,920
but because there won't be video you can safely sit there and make faces at me, and that won't show, okay?

180
00:14:08,230 --> 00:14:08,930


181
00:14:09,900 --> 00:14:14,880
Let's see. I also handed out this - there were two handouts I hope most of you have, 

182
00:14:15,280 --> 00:14:19,930
course information handout. So let me just say a few words about parts of these. 

183
00:14:21,290 --> 00:14:27,550
On the third page, there's a section that says Online Resources.

184
00:14:33,650 --> 00:14:35,190
Oh, okay. Louder? 

185
00:14:36,180 --> 00:14:42,670
Actually, could you turn up the volume? Testing. Is this better? Testing, testing. Okay, cool. 

186
00:14:43,480 --> 00:14:46,370
Thanks. 

187
00:14:47,470 --> 00:14:52,990
So all right, online resources. The class has a home page, so it's in on the handouts. I won't write on the chalkboard -

188
00:14:53,480 --> 00:14:54,640
http://cs229.stanford.edu. 

189
00:14:54,860 --> 00:14:56,450
http://cs229.stanford.edu. 

190
00:14:56,960 --> 00:14:57,930
And so 

191
00:14:58,440 --> 00:15:03,480
when there are homework assignments or things like that, we usually won't sort of 

192
00:15:04,030 --> 00:15:06,500
- in the mission of saving trees, we will 

193
00:15:07,090 --> 00:15:12,480
usually not give out many handouts in class. So homework assignments, homework solutions 

194
00:15:12,880 --> 00:15:16,280
will be posted online at the course home page.

195
00:15:16,890 --> 00:15:20,940
As far as this class, I've also written, and I guess I've also 

196
00:15:21,340 --> 00:15:22,300
revised every year 

197
00:15:22,620 --> 00:15:26,750
a set of fairly detailed lecture notes that cover the technical content of this class. 

198
00:15:27,330 --> 00:15:31,620
And so if you visit the course homepage, you'll also find the detailed lecture notes 

199
00:15:32,270 --> 00:15:32,940
that cover ...

200
00:15:33,390 --> 00:15:37,690
that go over in detail all the math and equations and so on that I'll be doing in class.

201
00:15:38,490 --> 00:15:38,760


202
00:15:39,070 --> 00:15:42,660
There's also a newsgroup, su.class.cs229, also written on the handout.

203
00:15:43,180 --> 00:15:43,660


204
00:15:44,130 --> 00:15:48,130
This is a newsgroup that's sort of a forum for people in the class to

205
00:15:48,420 --> 00:15:52,530
get to know each other and have whatever discussions you want to have amongst yourselves. 

206
00:15:53,030 --> 00:15:53,530
So 

207
00:15:54,060 --> 00:15:56,940
the class newsgroup will not be monitored by the TAs and me. 

208
00:15:57,340 --> 00:15:58,680
But this is a place for you 

209
00:15:59,050 --> 00:16:03,140
to form study groups or find project partners or discuss homework problems and so on, 

210
00:16:03,600 --> 00:16:05,240
and it's not monitored by the TAs and me. 

211
00:16:05,620 --> 00:16:08,890
So feel free to talk trash about this class there.

212
00:16:10,030 --> 00:16:12,350
If you want to contact the teaching staff, 

213
00:16:12,640 --> 00:16:14,120
please use the email address

214
00:16:14,440 --> 00:16:19,050
written down here, cs229-qa@cs.stanford.edu. 

215
00:16:20,290 --> 00:16:27,180
This goes to an account that's read by all the TAs and me. So rather than sending us email individually, if you send email to 

216
00:16:27,650 --> 00:16:30,410
this account, it will actually let us get back to you 

217
00:16:30,930 --> 00:16:35,470
maximally quickly with answers to your questions.

218
00:16:36,250 --> 00:16:38,110
If you're asking questions about homework problems, 

219
00:16:38,540 --> 00:16:44,210
please say in the subject line which assignment and which question the email refers to, since that will also 

220
00:16:44,650 --> 00:16:48,010
help us to route your question to the appropriate TA or to me appropriately 

221
00:16:48,500 --> 00:16:51,610
and get the response back to you quickly.

222
00:16:52,100 --> 00:16:57,490
Let's see. Skipping ahead - let's see - for homework, one midterm, one open and term project.

223
00:16:58,140 --> 00:16:59,650
Notice on the honor code. 

224
00:17:01,930 --> 00:17:07,380
So one thing that I think will help you to succeed and do well in this class and even help you to enjoy this class more 

225
00:17:07,770 --> 00:17:09,510
is if you form a study group.

226
00:17:10,330 --> 00:17:13,160
So start looking around where you're sitting now or 

227
00:17:13,560 --> 00:17:15,890
at the end of class today, 

228
00:17:16,280 --> 00:17:20,840
mingle a little bit and get to know your classmates. I strongly encourage you to form study groups 

229
00:17:21,290 --> 00:17:22,380
and sort of have a group of people 

230
00:17:22,940 --> 00:17:27,670
to study with and have a group of your fellow students to talk over these concepts with. 

231
00:17:28,260 --> 00:17:30,150
You can also post on the class 

232
00:17:30,690 --> 00:17:33,660
newsgroup if you want to use that to try to form a study group.

233
00:17:34,170 --> 00:17:39,520
But some of the problems sets in this class are reasonably difficult.

234
00:17:40,000 --> 00:17:42,280
People that have taken the class before may tell you they were very difficult. 

235
00:17:42,770 --> 00:17:44,790
And just I think, 

236
00:17:45,250 --> 00:17:47,200
I bet it would be more fun for you, and you'd probably 

237
00:17:47,600 --> 00:17:53,170
have a better learning experience if you form a study group of people to work with. So I definitely encourage you to do that.

238
00:17:54,420 --> 00:17:57,980
And just to say a word on the honor code, which is 

239
00:17:58,500 --> 00:18:02,850
I definitely encourage you to form a study group and work together, discuss homework problems together. 

240
00:18:03,380 --> 00:18:06,370
But if you discuss homework problems with 

241
00:18:06,820 --> 00:18:10,330
other students, then I'll ask you to 

242
00:18:10,890 --> 00:18:11,660
sort of go home 

243
00:18:12,100 --> 00:18:18,430
and write down your own solutions independently without referring to notes that were taken in any of your joint study sessions.

244
00:18:19,210 --> 00:18:20,520
So in other words, 

245
00:18:21,140 --> 00:18:24,820
when you turn in a homework problem, what you turn in should be something that 

246
00:18:25,230 --> 00:18:29,840
was reconstructed independently by yourself and without referring to notes that you took 

247
00:18:30,330 --> 00:18:33,070
during your study sessions with other people, okay? And obviously, 

248
00:18:33,490 --> 00:18:37,340
showing your solutions to others or copying other solutions directly is right out.

249
00:18:38,180 --> 00:18:38,700


250
00:18:39,160 --> 00:18:46,340
We occasionally also reuse problem set questions from previous years so that the problems are a bit more debugged and work more smoothly. 

251
00:18:46,860 --> 00:18:50,270
And as a result of that, I also ask you not to look at 

252
00:18:50,580 --> 00:18:53,470
solutions from previous years, and this includes both sort of 

253
00:18:54,060 --> 00:18:57,820
official solutions that we've given out to previous generations of this class 

254
00:18:58,300 --> 00:19:00,300
and previous solutions that 

255
00:19:00,600 --> 00:19:05,360
people that have taken this class in previous years may have written out by themselves, okay?

256
00:19:06,160 --> 00:19:06,830
Sadly, 

257
00:19:07,400 --> 00:19:09,260
in this class, there are usually 

258
00:19:09,680 --> 00:19:10,760
- sadly, in previous years,

259
00:19:11,160 --> 00:19:12,960
there have often been a few honor code violations 

260
00:19:13,330 --> 00:19:17,340
in this class. And last year, I think I prosecuted five honor code violations, 

261
00:19:17,570 --> 00:19:19,430
which I think is a ridiculously large number. 

262
00:19:19,990 --> 00:19:22,880
And so just don't work without solutions, 

263
00:19:23,430 --> 00:19:26,680
and hopefully there'll be zero honor code violations this year. I'd love for that to happen.

264
00:19:28,430 --> 00:19:29,140


265
00:19:29,940 --> 00:19:34,640
The section here on the late homework policy if you ever want to hand in a homework late, I'll leave you to read that yourself.

266
00:19:35,400 --> 00:19:35,710


267
00:19:35,980 --> 00:19:43,130
We also have a midterm, which is scheduled for Thursday, 8th of November at 6:00 p.m., so please keep that ...

268
00:19:43,770 --> 00:19:45,890
so please keep that evening free.

269
00:19:46,860 --> 00:19:48,200
And 

270
00:19:49,540 --> 00:19:52,800
let's see. 

271
00:19:58,550 --> 00:20:05,300
And one more administrative thing I wanted to say is about the class project. So 

272
00:20:06,240 --> 00:20:09,610
part of the goal of this class is to leave you well equipped to apply 

273
00:20:09,920 --> 00:20:13,990
machine learning algorithms to a problem or to do research in machine learning. 

274
00:20:14,520 --> 00:20:17,140
And so as part of this class, I'll ask you 

275
00:20:17,540 --> 00:20:21,800
to execute a small research project sort of as a small term project.

276
00:20:22,590 --> 00:20:27,910
And what most students do for this is either apply machine learning to a problem that 

277
00:20:28,490 --> 00:20:29,470
you find interesting 

278
00:20:29,860 --> 00:20:32,540
or investigate some aspect of machine learning. 

279
00:20:33,240 --> 00:20:38,320
So to those of you that are either already doing research or to those of you who are in industry, 

280
00:20:38,800 --> 00:20:40,690
you're taking this from a company,

281
00:20:41,220 --> 00:20:42,280
one fantastic sort of 

282
00:20:43,450 --> 00:20:51,000
way to do a class project would be if you apply machine learning algorithms to a problem that you're interested in, to a problem that you're already working on, 

283
00:20:51,400 --> 00:20:55,860
whether it be a science research problem or sort of a problem in industry 

284
00:20:56,330 --> 00:20:58,190
where you're trying to get a system to work using a learning algorithm.

285
00:20:58,750 --> 00:20:59,360


286
00:21:00,010 --> 00:21:05,230
To those of you that are not currently doing research, 

287
00:21:06,240 --> 00:21:08,920
one great way to do a project would be if you 

288
00:21:09,290 --> 00:21:14,130
apply learning algorithms to just pick a problem that you care about. Pick a problem that you find interesting,

289
00:21:14,620 --> 00:21:17,830
and apply learning algorithms to that and play with the ideas and see what happens.

290
00:21:18,850 --> 00:21:19,120


291
00:21:20,200 --> 00:21:20,500
And 

292
00:21:22,210 --> 00:21:22,420
let's see. 

293
00:21:24,500 --> 00:21:25,240
Oh, 

294
00:21:26,230 --> 00:21:32,690
and the goal of the project should really be for you to do a publishable piece of research in machine learning, okay?

295
00:21:33,450 --> 00:21:33,720


296
00:21:34,340 --> 00:21:38,860
And if you go to the course website, you'll actually find a list of the projects 

297
00:21:39,130 --> 00:21:44,890
that students had done last year. And so I'm holding the list in my hand. You can go home later and take a look at it online.

298
00:21:45,400 --> 00:21:48,740
But reading down this list, I see that last year, there were students that 

299
00:21:49,250 --> 00:21:53,160
applied learning algorithms to control a snake robot. There was a few 

300
00:21:53,780 --> 00:22:00,330
projects on improving learning algorithms. There's a project on flying autonomous aircraft. 

301
00:22:01,000 --> 00:22:04,380
There was a project actually done by our TA Paul on 

302
00:22:04,930 --> 00:22:09,950
improving computer vision algorithms using machine learning. There are a couple of projects on Netflix 

303
00:22:10,210 --> 00:22:12,160
rankings using learning algorithms; 

304
00:22:12,750 --> 00:22:13,100
a few 

305
00:22:13,400 --> 00:22:19,190
medical robots; ones on segmenting [inaudible] to segmenting pieces of the body using learning algorithms; 

306
00:22:19,860 --> 00:22:25,160
one on musical instrument detection; another on [irony|RNA] sequence alignment; and 

307
00:22:26,030 --> 00:22:32,000
a few algorithms on understanding the brain neuroscience, actually quite a few projects on neuroscience; 

308
00:22:32,730 --> 00:22:37,340
a couple of projects on [undescending|understanding] fMRI data on brain scans, 

309
00:22:38,000 --> 00:22:43,690
and so on; another project on market makings, the financial trading. There was an interesting project on 

310
00:22:44,250 --> 00:22:50,060
trying to use learning algorithms to decide what is it that makes a person's face physically attractive. 

311
00:22:50,680 --> 00:22:55,590
There's a learning algorithm on optical illusions, and so on. And it goes on, so lots of fun projects. And take a look, 

312
00:22:56,200 --> 00:22:56,410


313
00:22:56,970 --> 00:23:02,640
then come up with your own ideas. But whatever you find cool and interesting, I hope you'll be able to make 

314
00:23:03,220 --> 00:23:05,290
machine learning a project out of it. Yeah, question?

315
00:23:07,670 --> 00:23:09,780
*Student : *Are these group projects? *Instructor (Andrew Ng): *Oh, yes, thank you. *Student : *So how many people can be in a group?

316
00:23:10,250 --> 00:23:16,520
*Instructor (Andrew Ng): *Right. So projects can be done in groups of up to three people. So as part of forming study groups, 

317
00:23:17,060 --> 00:23:19,940
later today as you get to know your classmates, 

318
00:23:20,170 --> 00:23:23,350
I definitely also encourage you to grab two other people 

319
00:23:23,790 --> 00:23:25,630
and form a group of up to three people for your project, okay? 

320
00:23:27,090 --> 00:23:30,830
And just start brainstorming ideas for now amongst yourselves. 

321
00:23:31,210 --> 00:23:34,770
You can also come and talk to me or the TAs if you want to brainstorm ideas with us.

322
00:23:36,490 --> 00:23:38,690
Okay. 

323
00:23:40,310 --> 00:23:40,890
So 

324
00:23:42,270 --> 00:23:47,960
one more organizational question. I'm curious, how many of you know MATLAB? 

325
00:23:49,740 --> 00:23:51,640
Wow, cool, quite a lot. 

326
00:23:52,500 --> 00:23:55,220
Okay. So as part of the 

327
00:23:56,900 --> 00:24:01,760
- actually how many of you know Octave or have used Octave? Oh, okay, much smaller number.

328
00:24:02,640 --> 00:24:02,850
So 

329
00:24:03,630 --> 00:24:09,260
as part of this class, especially in the homeworks, we'll ask you to  implement a few programs, 

330
00:24:09,680 --> 00:24:12,350
a few machine learning algorithms as part of the homeworks. 

331
00:24:12,860 --> 00:24:13,070


332
00:24:13,410 --> 00:24:19,130
And most of those homeworks will be done in either MATLAB or in Octave, which is sort of - I know 

333
00:24:19,660 --> 00:24:23,840
some people call it a free version of MATLAB, which it sort of is, sort of isn't.

334
00:24:24,700 --> 00:24:31,100
So I guess for those of you that haven't seen MATLAB before, and I know most of you have, MATLAB is I guess part of 

335
00:24:31,670 --> 00:24:37,750
the programming language that makes it very easy to write codes using matrices, 

336
00:24:38,410 --> 00:24:42,950
to write code for numerical routines, to move data around, to plot data. 

337
00:24:43,610 --> 00:24:49,410
And it's sort of an extremely easy to learn tool to use for implementing a lot of learning algorithms.

338
00:24:50,340 --> 00:24:51,670


339
00:24:52,230 --> 00:24:57,860
And in case some of you want to work on your own home computer or something if you don't have a MATLAB license, 

340
00:24:58,220 --> 00:25:00,110
for the purposes of this class, 

341
00:25:00,550 --> 00:25:06,310
there's also - [inaudible] write that down [inaudible] MATLAB - there's also a software package 

342
00:25:06,750 --> 00:25:09,940
called Octave that you can download for free off the Internet. 

343
00:25:10,420 --> 00:25:17,180
And it has somewhat fewer features than MATLAB, but it's free, and for the purposes of this class, it will work for just about everything.

344
00:25:18,280 --> 00:25:18,590


345
00:25:19,510 --> 00:25:20,780
So actually I, well, 

346
00:25:21,680 --> 00:25:22,650
so yeah, 

347
00:25:23,150 --> 00:25:27,390
just a side comment for those of you that haven't seen MATLAB before I guess, 

348
00:25:27,830 --> 00:25:35,040
once a colleague of mine at a different university, not at Stanford, actually teaches another machine learning course. 

349
00:25:35,590 --> 00:25:36,430
He's taught it for many years. 

350
00:25:37,690 --> 00:25:38,870
So one day, 

351
00:25:39,770 --> 00:25:43,450
he was in his office, and an old student of his from, like, ten years ago 

352
00:25:44,190 --> 00:25:46,750
came into his office and he said, "Oh, professor, professor, 

353
00:25:47,140 --> 00:25:50,540
thank you so much for your machine learning class. I learned so much from it. 

354
00:25:50,990 --> 00:25:51,220


355
00:25:51,550 --> 00:25:57,370
There's this stuff that I learned in your class, and I now use every day. And it's helped me make lots of money, and here's a picture of 

356
00:25:57,630 --> 00:25:59,270
my big house."

357
00:25:59,860 --> 00:26:04,660
So my friend was very excited. He said, "Wow. That's great. I'm glad to hear this machine learning stuff was actually useful. 

358
00:26:05,110 --> 00:26:09,910
So what was it that you learned? Was it logistic regression? 

359
00:26:10,380 --> 00:26:15,940
Was it the PCA? Was it the [data|Bayesian] networks? What was it that you learned that was so helpful?" And the student said, "Oh, it was the MATLAB."

360
00:26:16,560 --> 00:26:18,480


361
00:26:19,270 --> 00:26:23,790
So for those of you that don't know MATLAB yet, I hope 

362
00:26:24,320 --> 00:26:27,280
you do learn it. It's not hard, and we'll actually 

363
00:26:27,680 --> 00:26:32,140
have a short MATLAB tutorial in one of the discussion sections 

364
00:26:32,850 --> 00:26:34,230
for those of you that don't know it.

365
00:26:35,250 --> 00:26:35,960


366
00:26:36,660 --> 00:26:36,970
Okay. 

367
00:26:37,550 --> 00:26:42,950
The very last piece of logistical thing is the discussion sections. 

368
00:26:43,530 --> 00:26:48,990
So discussion sections will be taught by the TAs, and attendance at discussion sections 

369
00:26:49,410 --> 00:26:50,170
is optional, 

370
00:26:50,530 --> 00:26:53,430
although they'll also be recorded and televised. 

371
00:26:54,290 --> 00:26:59,630
And we'll use the discussion sections mainly for two things. For the next two or three weeks, 

372
00:27:00,100 --> 00:27:03,550
we'll use the discussion sections to go over the prerequisites 

373
00:27:04,040 --> 00:27:09,140
to this class or if some of you haven't seen probability or statistics for a while or [maybe|linear] algebra,

374
00:27:10,450 --> 00:27:14,970
we'll go over those in the discussion sections as a refresher for those of you that want one.

375
00:27:15,720 --> 00:27:15,940


376
00:27:16,190 --> 00:27:20,910
Later in this quarter, we'll also use the discussion sections to go over extensions 

377
00:27:21,490 --> 00:27:22,440
for the material 

378
00:27:22,710 --> 00:27:26,130
that I'm teaching in the main lectures. So machine learning is a huge field, and 

379
00:27:26,520 --> 00:27:30,410
there are a few extensions that we really want to teach but didn't have time in the main lectures for.

380
00:27:30,950 --> 00:27:36,280
So later this quarter, we'll use the discussion sections to talk about things like convex optimization, 

381
00:27:36,780 --> 00:27:40,690
to talk a little bit about hidden Markov models, which is a type of machine learning algorithm 

382
00:27:41,220 --> 00:27:43,280
for modeling time series and a few other things, 

383
00:27:43,580 --> 00:27:46,900
so extensions to the materials that I'll be covering in the main lectures. 

384
00:27:47,520 --> 00:27:50,720
And attendance at the discussion sections is optional, 

385
00:27:51,710 --> 00:27:56,660
okay? So that was all I had 

386
00:27:57,540 --> 00:28:04,180
from logistics. Before we move on to start talking a bit about machine learning, let me check what questions you have. 

387
00:28:16,350 --> 00:28:21,610
*Student : *[Inaudible] R or something like that?

388
00:28:22,180 --> 00:28:25,800
*Instructor (Andrew Ng) : *Oh, yeah, let's see, right. 

389
00:28:26,400 --> 00:28:29,670
So our policy has been that you're welcome to use R, 

390
00:28:30,630 --> 00:28:37,420
but I would strongly advise against it, mainly because in the last problem set, we actually supply some code that will run in Octave 

391
00:28:37,960 --> 00:28:42,700
but that would be somewhat painful for you to translate into R yourself. 

392
00:28:43,590 --> 00:28:46,300
So for your other assignments, if you wanna submit a solution in R, 

393
00:28:46,780 --> 00:28:47,850
that's fine. 

394
00:28:48,150 --> 00:28:52,040
But I think MATLAB is actually totally worth learning. I know R and MATLAB, 

395
00:28:52,770 --> 00:28:54,340
and I personally end up using MATLAB 

396
00:28:55,210 --> 00:28:56,820
quite a bit more often for various reasons.

397
00:29:04,020 --> 00:29:05,160
*Student : *For the [inaudible] project [inaudible]?

398
00:29:07,110 --> 00:29:11,790
*Instructor (Andrew Ng) : *So for the term project, you're welcome to do it in smaller groups of three, or you're welcome to do it by yourself 

399
00:29:12,030 --> 00:29:13,410
or in groups of two. 

400
00:29:14,010 --> 00:29:20,900
Grading is the same regardless of the group size, so with a larger group, you probably - I recommend 

401
00:29:21,360 --> 00:29:24,710
trying to form a team, but it's actually totally fine to do it in a smaller group if you want.

402
00:29:30,070 --> 00:29:30,950


403
00:29:31,990 --> 00:29:32,780
*Student : *[Inaudible] what language [inaudible]?

404
00:29:33,780 --> 00:29:35,180
Instructor (Andrew Ng): *So let's see. 

405
00:29:35,600 --> 00:29:42,110
There is no C programming in this class other than any that you may choose to do yourself in your project.

406
00:29:42,970 --> 00:29:43,920
So 

407
00:29:44,530 --> 00:29:47,820
all the homeworks can be done in MATLAB or Octave, 

408
00:29:48,400 --> 00:29:49,160
and 

409
00:29:49,640 --> 00:29:53,570
let's see. And I guess the program prerequisites is more the 

410
00:29:54,010 --> 00:29:57,620
ability to understand big-O notation and 

411
00:29:57,910 --> 00:30:02,340
knowledge of what a data structure, like a linked list or a queue or binary [treatments|trees], 

412
00:30:02,940 --> 00:30:05,930
more so than your knowledge of C or Java specifically.

413
00:30:10,180 --> 00:30:15,890
*Student : *Looking at the end semester project, I mean, what exactly will you be testing over there? [Inaudible]?

414
00:30:16,650 --> 00:30:20,400
*Instructor (Andrew Ng) : *Of the project? *Student : *Yeah. *Instructor (Andrew Ng) : Yeah, let me answer that later. 

415
00:30:20,850 --> 00:30:24,980
In a couple of weeks, I shall give out a handout with guidelines for the project.

416
00:30:25,380 --> 00:30:27,350
But for now, we should think of the goal as 

417
00:30:27,790 --> 00:30:32,650
being to do a cool piece of machine learning work that will let you experience 

418
00:30:33,000 --> 00:30:35,260
the joys of machine learning firsthand 

419
00:30:35,790 --> 00:30:39,970
and really try to think about doing a publishable piece of work.

420
00:30:40,460 --> 00:30:45,040
So many students will try to build a cool machine learning application. That's probably the most common project. 

421
00:30:45,650 --> 00:30:47,020
Some students will try to 

422
00:30:47,440 --> 00:30:52,890
improve state-of-the-art machine learning. Some of those projects are also very successful. It's a little 

423
00:30:53,250 --> 00:30:54,090
bit harder to do. 

424
00:30:54,580 --> 00:30:58,810
And there's also a smaller minority of students that will sometimes try to prove 

425
00:30:59,690 --> 00:31:03,110
- develop the theory of machine learning further or try to prove theorems about machine learning. 

426
00:31:05,210 --> 00:31:07,450
So they're usually great projects of all of those

427
00:31:07,670 --> 00:31:10,400
types with applications and machine learning being the most common. Anything else? 

428
00:31:17,400 --> 00:31:19,900
Okay, cool. So 

429
00:31:21,390 --> 00:31:27,600
that was it for logistics. Let's talk about learning algorithms. 

430
00:31:29,380 --> 00:31:31,240
So can I have the laptop display, please, 

431
00:31:33,220 --> 00:31:33,540
or the projector? 

432
00:31:41,810 --> 00:31:46,530
Actually, could you lower the big screen? Cool. 

433
00:31:56,240 --> 00:31:59,420
This is amazing customer service. Thank you. 

434
00:32:01,200 --> 00:32:04,820
I see. Okay, cool. Okay. No, that's fine. I see. 

435
00:32:07,410 --> 00:32:08,550
Okay. That's cool. Thanks. Okay.

436
00:32:08,930 --> 00:32:14,740
Big screen isn't working today, but I hope you can read things on the smaller screens out there. Actually, [inaudible] I think 

437
00:32:15,200 --> 00:32:17,380
this room just got a new projector that 

438
00:32:17,880 --> 00:32:18,160
 

439
00:32:18,890 --> 00:32:25,380
- someone sent you an excited email - was it just on Friday? - saying we just got a new projector and they said

440
00:32:25,910 --> 00:32:29,070
4,000-to-1 something or other brightness ratio. I don't know. 

441
00:32:29,550 --> 00:32:34,690
Someone was very excited about the new projector in this room, but I guess we'll see that in operation on Wednesday.

442
00:32:35,450 --> 00:32:36,600
So 

443
00:32:37,600 --> 00:32:42,380
start by talking about what machine learning is. What is machine learning? 

444
00:32:43,510 --> 00:32:45,600
Actually, can you read ...

445
00:32:45,880 --> 00:32:49,930
can you read the text out there? Raise your hand if the text on the small screens is legible. 

446
00:32:52,060 --> 00:32:55,710
Oh, okay, cool, mostly legible. Okay. So I'll just read it out.

447
00:32:56,410 --> 00:32:58,190
So what is machine learning? 

448
00:32:59,040 --> 00:33:02,720
Way back in about 1959, Arthur Samuel

449
00:33:03,220 --> 00:33:09,700
defined machine learning informally as the [inaudible|field study] that gives computers to learn 

450
00:33:11,120 --> 00:33:15,340
- [inaudible] that gives computers the ability to learn without being explicitly programmed. 

451
00:33:16,220 --> 00:33:22,330
So Arthur Samuel, so way back in the history of machine learning, actually did something 

452
00:33:23,750 --> 00:33:27,550
very cool, which was he wrote a checkers program, 

453
00:33:28,350 --> 00:33:30,860
which would play games of checkers against itself.

454
00:33:31,800 --> 00:33:33,860
And so because 

455
00:33:34,590 --> 00:33:38,700
a computer can play thousands of games against itself relatively quickly, 

456
00:33:39,270 --> 00:33:43,160
Arthur Samuel had his program play thousands of games against itself, 

457
00:33:43,770 --> 00:33:50,790
and over time it would start to learn to recognize patterns which led to wins and patterns which led to losses. 

458
00:33:51,630 --> 00:33:53,130
So over time it learned things like that, 

459
00:33:53,600 --> 00:33:57,470
"Gee, if I get a lot of pieces taken by the opponent, then 

460
00:33:57,920 --> 00:34:01,680
I'm more likely to lose than win," or, "Gee, if I get my pieces into a certain position, 

461
00:34:02,190 --> 00:34:03,630
then I'm especially likely 

462
00:34:04,270 --> 00:34:04,920
to win rather than lose."

463
00:34:05,990 --> 00:34:08,170
And so over time, Arthur Samuel 

464
00:34:08,820 --> 00:34:12,850
had a checkers program that would actually learn to play checkers by learning what are the 

465
00:34:13,280 --> 00:34:19,350
sort of board positions that tend to be associated with wins and what are the board positions that tend to be associated with losses. 

466
00:34:20,060 --> 00:34:20,390
And 

467
00:34:21,280 --> 00:34:24,110
way back around 1959, 

468
00:34:24,670 --> 00:34:26,850
the amazing thing about this was that 

469
00:34:27,200 --> 00:34:31,760
his program actually learned to play checkers much better than Arthur Samuel himself could.

470
00:34:33,570 --> 00:34:33,780
So 

471
00:34:34,360 --> 00:34:40,320
even today, there are some people that say, well, computers can't do anything that they're not explicitly programmed to. 

472
00:34:40,770 --> 00:34:42,830
And Arthur Samuel's checkers program was 

473
00:34:43,320 --> 00:34:44,430
maybe the first 

474
00:34:45,420 --> 00:34:50,140
I think really convincing refutation of this claim. Namely, 

475
00:34:50,950 --> 00:34:55,750
Arthur Samuel managed to write a checkers program that could play checkers much better than he personally could, 

476
00:34:56,530 --> 00:35:01,340
and this is an instance of maybe computers learning to do things that they were not programmed explicitly to do.

477
00:35:03,430 --> 00:35:03,670


478
00:35:04,500 --> 00:35:11,210
Here's a more recent, a more modern, more formal definition of machine learning due to Tom Mitchell, 

479
00:35:11,920 --> 00:35:17,070
who says that a well-posed learning problem is defined as follows: 

480
00:35:17,810 --> 00:35:18,830
He says that 

481
00:35:20,100 --> 00:35:25,490
a computer program is set to learn from an experience E with respect to some task T 

482
00:35:26,130 --> 00:35:27,730
and some performance measure P 

483
00:35:27,970 --> 00:35:31,760
if its performance on T as measured by P improves with experience E. 

484
00:35:32,590 --> 00:35:35,190
So not only is it a definition, it even rhymes.

485
00:35:36,100 --> 00:35:36,740


486
00:35:38,180 --> 00:35:41,330
So, for example, in the case of checkers, 

487
00:35:41,940 --> 00:35:45,670
the experience E that a program has would be 

488
00:35:46,130 --> 00:35:49,990
the experience of playing lots of games of checkers against itself, say. 

489
00:35:50,710 --> 00:35:53,510
The task T is the task of playing checkers, 

490
00:35:54,080 --> 00:35:59,360
and the performance measure P will be something like the fraction of games it wins against a certain set of human opponents. 

491
00:36:00,240 --> 00:36:06,150
And by this definition, we'll say that Arthur Samuel's checkers program has learned to play checkers, 

492
00:36:07,230 --> 00:36:12,680
okay? So as an overview of what we're going to do in this class, 

493
00:36:13,130 --> 00:36:16,910
this class is sort of organized into four major sections. 

494
00:36:17,700 --> 00:36:21,070
We're gonna talk about four major topics in this class, the first of which is 

495
00:36:21,670 --> 00:36:23,440
supervised learning.

496
00:36:25,020 --> 00:36:26,240
So let me give you an example of that.

497
00:36:28,940 --> 00:36:32,430


498
00:36:33,360 --> 00:36:33,790
So 

499
00:36:34,660 --> 00:36:39,160
suppose you collect a data set of housing prices. 

500
00:36:39,550 --> 00:36:44,580
And one of the TAs, Dan Ramage, actually collected a data set for me last week to use in the example later. 

501
00:36:45,230 --> 00:36:50,700
But suppose that you go to collect 

502
00:36:51,890 --> 00:36:53,850
statistics about how much houses 

503
00:36:54,430 --> 00:36:57,390
cost in a certain geographic area. And Dan, the TA,

504
00:36:57,760 --> 00:37:00,400
collected data from housing prices in Portland, Oregon. 

505
00:37:01,080 --> 00:37:02,080


506
00:37:02,970 --> 00:37:10,160
So what you can do is let's say plot the square footage of the house 

507
00:37:10,920 --> 00:37:14,070
against the list price of the house, right, 

508
00:37:14,450 --> 00:37:19,210
so you collect data on a bunch of houses. 

509
00:37:19,740 --> 00:37:23,860
And let's say you get a data set like this with houses of different sizes that 

510
00:37:24,310 --> 00:37:25,800
are listed for different amounts of money.

511
00:37:26,930 --> 00:37:27,610


512
00:37:29,190 --> 00:37:31,640
Now, let's say that I'm trying to sell a house in 

513
00:37:32,400 --> 00:37:33,520
the same area as

514
00:37:33,880 --> 00:37:40,110
Portland, Oregon as where the data comes from. Let's say I have a house that's this size in square footage, 

515
00:37:40,970 --> 00:37:46,110
and I want an algorithm to tell me about how much should I expect my house to sell for. 

516
00:37:47,210 --> 00:37:53,950
So there are lots of ways to do this, and some of you may have seen elements of what I'm about to say before.

517
00:37:55,050 --> 00:37:59,370
So one thing you could do is look at this data and maybe put a straight line to it. 

518
00:38:00,220 --> 00:38:02,860
And then if this is my house, 

519
00:38:03,370 --> 00:38:09,790
you may then look at the straight line and predict that my house is gonna go for about that much money, right? 

520
00:38:11,480 --> 00:38:11,770


521
00:38:12,750 --> 00:38:17,210
There are other decisions that we can make, which we'll talk about later, which is, well, what if 

522
00:38:17,750 --> 00:38:23,140
I don't wanna put a straight line? Maybe I should put a quadratic function to it. Maybe that fits the data a little bit better. 

523
00:38:23,660 --> 00:38:27,170
You notice if you do that, the price of my house goes up a bit, so that'd be nice.

524
00:38:28,180 --> 00:38:29,830


525
00:38:30,880 --> 00:38:37,170
And this sort of learning problem of learning to predict housing prices is an example of what's called a supervised learning problem. 

526
00:38:38,080 --> 00:38:38,320


527
00:38:38,790 --> 00:38:44,380
And the reason that it's called supervised learning is because we're providing the algorithm a data set of 

528
00:38:45,490 --> 00:38:50,730
a bunch of square footages, a bunch of housing sizes, and as well as sort of the right answer 

529
00:38:51,090 --> 00:38:52,870
of what the actual prices 

530
00:38:53,510 --> 00:38:54,560
of a number of houses were, right?

531
00:38:55,690 --> 00:38:59,160
So we call this supervised learning because we're supervising the algorithm or, in other words, 

532
00:38:59,610 --> 00:39:02,940
we're giving the algorithm the, quote, right answer 

533
00:39:03,440 --> 00:39:04,370
for a number of houses. 

534
00:39:04,970 --> 00:39:10,960
And then we want the algorithm to learn the association between the inputs and the outputs and to sort of give us more of the right answers, 

535
00:39:11,970 --> 00:39:13,730
okay?

536
00:39:14,770 --> 00:39:20,560
It turns out this specific example that I drew here is an example of something called a regression problem. 

537
00:39:20,870 --> 00:39:23,010
And 

538
00:39:23,570 --> 00:39:29,250
the term regression sort of refers to the fact that the variable you're trying to predict is a continuous value ...

539
00:39:29,850 --> 00:39:31,730
continuous value and price. 

540
00:39:32,420 --> 00:39:34,660


541
00:39:36,340 --> 00:39:36,940


542
00:39:37,710 --> 00:39:43,030
There's another class of supervised learning problems which we'll talk about, which are classification problems. 

543
00:39:43,440 --> 00:39:45,200
And so, 

544
00:39:49,040 --> 00:39:52,790
in a classification problem, the variable you're trying to predict is 

545
00:39:53,410 --> 00:39:57,960
discrete rather than continuous. So as one specific example 

546
00:39:58,520 --> 00:39:58,750


547
00:39:59,220 --> 00:40:04,510
- so actually a standard data set you can download online [inaudible] that lots of machine learning people have played with. 

548
00:40:04,890 --> 00:40:08,750
Let's say you collect a data set on breast cancer tumors, 

549
00:40:09,140 --> 00:40:10,920
and you want to know ...

550
00:40:11,350 --> 00:40:11,840


551
00:40:12,340 --> 00:40:18,600
learn the algorithm to predict whether or not a certain tumor is malignant. 

552
00:40:19,070 --> 00:40:21,950
Malignant is the opposite of benign, right, so malignancy is 

553
00:40:22,430 --> 00:40:23,430
a sort of harmful, bad tumor. 

554
00:40:24,090 --> 00:40:30,430
So we collect some number of features, some number of properties of these tumors, and for the sake of 

555
00:40:31,090 --> 00:40:36,880
sort of having a simple [inaudible] explanation, let's just say that we're going to look at 

556
00:40:37,490 --> 00:40:42,280
the size of the tumor and depending on the size of the tumor, we'll try to figure out whether or not 

557
00:40:42,940 --> 00:40:47,600
the tumor is malignant or benign. So the tumor is either malignant or benign, 

558
00:40:48,160 --> 00:40:48,600
and so 

559
00:40:49,010 --> 00:40:55,190
the variable in the Y-axis is either zero or 1, and so your data set may look something like ...

560
00:40:56,000 --> 00:41:02,240
may look something like that, right? 

561
00:41:03,350 --> 00:41:07,620
And that's 1 and that's zero, okay? 

562
00:41:08,870 --> 00:41:13,960
And so this is an example of a classification problem 

563
00:41:14,710 --> 00:41:16,690
where 

564
00:41:17,500 --> 00:41:17,850


565
00:41:19,350 --> 00:41:26,310
the variable you're trying to predict is a discreet value. It's either zero or 1. And in fact, more generally, 

566
00:41:27,150 --> 00:41:30,510
there will be many learning problems where we'll 

567
00:41:31,410 --> 00:41:34,640
have more than one input variable, more than one input feature and

568
00:41:35,010 --> 00:41:36,900
use more than one variable to try to predict, 

569
00:41:37,260 --> 00:41:40,120
say, whether a tumor is malignant or benign. So, for example, 

570
00:41:40,610 --> 00:41:41,130


571
00:41:41,990 --> 00:41:46,070
continuing with this, 

572
00:41:46,900 --> 00:41:51,840
you may instead have a data set that looks like this. I'm gonna part this data set in a slightly different way now. 

573
00:41:52,500 --> 00:41:54,200


574
00:41:54,670 --> 00:41:55,440


575
00:41:56,170 --> 00:42:03,420
And I'm making this data set look much cleaner than it really is in reality for illustration, 

576
00:42:03,990 --> 00:42:07,950
okay?

577
00:42:09,640 --> 00:42:14,730
For example, maybe the crosses indicate malignant tumors 

578
00:42:15,600 --> 00:42:17,100


579
00:42:17,620 --> 00:42:21,500
and the "O"s may indicate benign tumors. 

580
00:42:22,080 --> 00:42:24,280
And so you may have a data set 

581
00:42:25,640 --> 00:42:27,110


582
00:42:28,430 --> 00:42:29,150
comprising

583
00:42:30,030 --> 00:42:34,210
patients of different ages and who have different tumor sizes and 

584
00:42:34,820 --> 00:42:41,170
where a cross indicates a malignant tumor, and an "O" indicates a benign tumor. And you may want an algorithm to learn to 

585
00:42:41,970 --> 00:42:45,020
predict, given a new patient, whether their tumor is malignant or benign.

586
00:42:47,410 --> 00:42:50,090
So, for example, what a learning algorithm may do is 

587
00:42:50,310 --> 00:42:55,390
maybe come in and decide that a straight line like that separates the two classes of tumors really well, 

588
00:42:56,260 --> 00:42:58,580
and so if you have a new patient [who's|whose] 

589
00:42:59,180 --> 00:43:00,040


590
00:43:00,640 --> 00:43:07,080
age and tumor size fall over there, then the algorithm may predict that the tumor is benign rather than malignant, okay? 

591
00:43:08,620 --> 00:43:14,320
So this is just another example of another supervised learning problem 

592
00:43:15,260 --> 00:43:17,280
and another classification problem.

593
00:43:18,180 --> 00:43:22,220
And so it turns out that one of the issues we'll talk about later in this class is 

594
00:43:22,700 --> 00:43:24,330
in this specific example, 

595
00:43:24,950 --> 00:43:25,900
we're going to try to predict

596
00:43:26,520 --> 00:43:31,890
whether a tumor is malignant or benign based on two features or based on two inputs, namely the age 

597
00:43:32,490 --> 00:43:38,380
of the patient and the tumor size. It turns out that when you look at a real data set, you find that 

598
00:43:38,700 --> 00:43:39,550
learning algorithms 

599
00:43:39,990 --> 00:43:44,000
often use other sets of features. In the breast cancer data example, 

600
00:43:44,520 --> 00:43:45,110
you also use 

601
00:43:45,570 --> 00:43:51,210
properties of the tumors, like clump thickness, uniformity of cell size, uniformity of cell shape, 

602
00:43:51,650 --> 00:43:55,470
[inaudible] adhesion and so on, so various other medical properties.

603
00:43:56,040 --> 00:43:57,140
And 

604
00:43:57,980 --> 00:44:01,130
one of the most interesting things we'll talk about later this quarter is 

605
00:44:01,650 --> 00:44:01,870


606
00:44:02,280 --> 00:44:07,320
what if your data doesn't lie in a two-dimensional or three-dimensional or sort of even a finite dimensional space, 

607
00:44:08,090 --> 00:44:09,090
but is it possible 

608
00:44:09,790 --> 00:44:11,950
- what if your data actually lies in an infinite dimensional space? 

609
00:44:13,080 --> 00:44:14,700
Our plots here are two-dimensional space. 

610
00:44:15,620 --> 00:44:20,620
I can't plot you an infinite dimensional space, right? And so it turns out that 

611
00:44:21,830 --> 00:44:25,750
one of the most successful classes of machine learning algorithms - some may call support vector machines - 

612
00:44:26,410 --> 00:44:29,440
actually takes data and 

613
00:44:30,030 --> 00:44:34,760
maps data to an infinite dimensional space and then does classification using not 

614
00:44:34,970 --> 00:44:37,380
two features like I've done here, but an infinite number of features.

615
00:44:38,090 --> 00:44:43,080
And that will actually be one of the most fascinating things we talk about when we go deeply into classification algorithms. 

616
00:44:44,010 --> 00:44:48,920
And it's actually an interesting question, right, so think about how do you even represent an infinite dimensional vector 

617
00:44:49,510 --> 00:44:54,400
in computer memory? You don't have an infinite amount of computers. How do you even represent a point

618
00:44:55,020 --> 00:44:56,270
that lies in an infinite dimensional space? 

619
00:44:57,460 --> 00:45:00,250
We'll talk about that when we get to 

620
00:45:01,970 --> 00:45:03,930
support vector machines, okay?

621
00:45:04,290 --> 00:45:08,470
So let's see. 

622
00:45:17,570 --> 00:45:18,630


623
00:45:19,580 --> 00:45:20,700
[|So the second would be ...]

624
00:45:21,450 --> 00:45:26,860
So that was supervised learning. The second of the four major topics of this class will be 

625
00:45:27,290 --> 00:45:29,330
learning theory. 

626
00:45:31,400 --> 00:45:32,000
So 

627
00:45:32,990 --> 00:45:37,090
I have a friend who teaches math at a different university, not at Stanford, and 

628
00:45:37,620 --> 00:45:42,210
when you talk to him about his work and what he's really out to do, 

629
00:45:42,640 --> 00:45:44,130
this friend of mine will 

630
00:45:44,700 --> 00:45:49,150
- he's a math professor, right? - this friend of mine will sort of get the look of wonder in his eyes, and he'll 

631
00:45:49,620 --> 00:45:51,800
tell you about how in his mathematical work, 

632
00:45:52,210 --> 00:45:56,760
he feels like he's discovering truth and beauty in the universe. And he says it 

633
00:45:57,280 --> 00:46:00,150
in sort of a really touching, sincere way, and then he has this 

634
00:46:00,950 --> 00:46:07,900
- you can see it in his eyes - he has this deep appreciation of the truth and beauty in the universe as revealed to him by the math he does.

635
00:46:08,840 --> 00:46:14,280
In this class, I'm not gonna do any truth and beauty. 

636
00:46:15,350 --> 00:46:21,730
In this class, I'm gonna talk about learning theory to try to convey to you an understanding of how and why learning algorithms 

637
00:46:22,060 --> 00:46:23,000
work 

638
00:46:23,510 --> 00:46:28,060
so that we can apply these learning algorithms as effectively as possible.

639
00:46:28,760 --> 00:46:29,000


640
00:46:29,850 --> 00:46:31,580
So, for example, [we will ...]

641
00:46:31,980 --> 00:46:38,110
it turns out you can prove surprisingly deep theorems on when you can guarantee that a learning algorithm will work, all right? 

642
00:46:38,490 --> 00:46:40,540
So think about a learning algorithm for reading zip codes. 

643
00:46:41,330 --> 00:46:42,710
When can you prove a theorem 

644
00:46:43,190 --> 00:46:48,130
guaranteeing that a learning algorithm will be at least 99.9 percent accurate on reading zip codes? 

645
00:46:48,500 --> 00:46:50,430
This is actually somewhat surprising. We actually prove theorems 

646
00:46:51,070 --> 00:46:53,140
showing when you can expect that to hold.

647
00:46:54,100 --> 00:46:58,770
We'll also sort of delve into learning theory to try to understand what algorithms can 

648
00:46:59,050 --> 00:47:01,410
approximate different functions well [and when]

649
00:47:01,880 --> 00:47:08,410
and also try to understand things like how much training data do you need? So how many examples of houses do I need 

650
00:47:08,920 --> 00:47:11,580
in order for your learning algorithm to recognize the pattern 

651
00:47:12,060 --> 00:47:15,820
between the square footage of a house and its housing price? 

652
00:47:16,380 --> 00:47:17,840
And this will help us answer questions like 

653
00:47:18,710 --> 00:47:23,040
if you're trying to design a learning algorithm, should you be spending more time collecting more data 

654
00:47:23,440 --> 00:47:26,790
or is it a case that you already have enough data; it would be a waste of time to try to collect more. 

655
00:47:27,570 --> 00:47:29,840
Okay? So 

656
00:47:30,740 --> 00:47:35,630
I think learning algorithms are a very powerful tool that as I 

657
00:47:36,160 --> 00:47:41,300
walk around sort of industry in Silicon Valley or as I work with various businesses in CS and outside CS, 

658
00:47:42,410 --> 00:47:45,150
I find that there's often a huge difference between 

659
00:47:45,710 --> 00:47:47,910
how well someone who really understands this stuff

660
00:47:48,540 --> 00:47:52,820
can apply a learning algorithm versus someone who sort of gets it but sort of doesn't.

661
00:47:54,220 --> 00:47:55,770
The analogy I like to think of is 

662
00:47:56,200 --> 00:48:02,220
imagine you were going to a carpentry school instead of a machine learning class, right? If you go to a carpentry school, 

663
00:48:02,740 --> 00:48:06,900
they can give you the tools of carpentry. They'll give you a hammer, a bunch of nails, a screwdriver or whatever. 

664
00:48:07,580 --> 00:48:12,690
But a master carpenter will be able to use those tools far better than most of us in this room. 

665
00:48:13,210 --> 00:48:15,690
I know a carpenter can do things with a hammer and nail that

666
00:48:15,920 --> 00:48:16,450
I couldn't possibly. 

667
00:48:17,430 --> 00:48:23,760
And it's actually a little bit like that in machine learning, too. One thing that's sadly not taught in many courses on machine learning is 

668
00:48:24,360 --> 00:48:28,550
how to take the tools of machine learning and really, really apply them well. So 

669
00:48:29,350 --> 00:48:29,870
in the same way, 

670
00:48:30,240 --> 00:48:36,820
so the tools of machine learning are I wanna say quite a bit more advanced than the tools of carpentry. Maybe a carpenter will disagree. But 

671
00:48:37,410 --> 00:48:43,160
a large part of this class will be just giving you the raw tools of machine learning, just the algorithms and so on. 

672
00:48:43,860 --> 00:48:49,940
But what I plan to do throughout this entire quarter, not just in the segment of learning theory, but actually as 

673
00:48:50,460 --> 00:48:54,320
a theme running through everything I do this quarter, will be to try to convey to you 

674
00:48:54,960 --> 00:49:00,320
the skills to really take the learning algorithm ideas and really to get them to work on a problem.

675
00:49:01,240 --> 00:49:02,720
It's sort of hard for me 

676
00:49:03,420 --> 00:49:03,630


677
00:49:04,230 --> 00:49:09,810
to stand here and say how big a deal that is, but when I walk around companies in Silicon Valley, 

678
00:49:10,300 --> 00:49:13,630
it's completely not uncommon for me to see someone using 

679
00:49:14,370 --> 00:49:20,390
some machine learning algorithm and then explain to me what they've been doing for the last six months, and go, oh, gee, 

680
00:49:21,290 --> 00:49:25,350
it should have been obvious from the start that the last six months, you've been wasting your time, right?

681
00:49:25,890 --> 00:49:27,370


682
00:49:27,910 --> 00:49:33,420
And so my goal in this class, running through the entire quarter, not just on learning theory, is actually 

683
00:49:33,790 --> 00:49:35,510
not only to give you the tools of machine learning, 

684
00:49:36,050 --> 00:49:38,810
but to teach you how to use them well. And 

685
00:49:39,410 --> 00:49:42,690
I've noticed this is something that really not many other classes teach. 

686
00:49:43,780 --> 00:49:48,260
And this is something I'm really convinced is a huge deal, and so 

687
00:49:48,800 --> 00:49:55,420
by the end of this class, I hope all of you will be master carpenters. I hope all of you will be really good at applying these learning algorithms 

688
00:49:56,300 --> 00:49:58,540
and getting them to work amazingly well in many problems. Okay?

689
00:50:01,270 --> 00:50:06,630
Let's see. So [inaudible] the board. 

690
00:50:09,230 --> 00:50:11,950
After learning theory, 

691
00:50:13,970 --> 00:50:17,320
there's another class of learning algorithms that I then want to 

692
00:50:17,900 --> 00:50:21,800
teach you about, and that's unsupervised learning. 

693
00:50:24,250 --> 00:50:27,110


694
00:50:27,780 --> 00:50:30,800


695
00:50:31,470 --> 00:50:34,140
So you recall, right, 

696
00:50:35,250 --> 00:50:40,910
a little earlier I drew an example like this, right, where you have 

697
00:50:41,330 --> 00:50:46,240
a couple of features, a couple of input variables and sort of malignant tumors and benign tumors or whatever. 

698
00:50:46,690 --> 00:50:49,280
And that was an example of a supervised learning problem because 

699
00:50:49,850 --> 00:50:53,800
the data you have gives you the right answer for each of your patients. 

700
00:50:54,130 --> 00:50:57,560
The data tells you this patient has a malignant tumor; this patient has a benign tumor. 

701
00:50:58,250 --> 00:51:01,970
So it had the right answers, and you wanted the algorithm to just produce more of the same.

702
00:51:03,110 --> 00:51:08,440
In contrast, in an unsupervised learning problem, 

703
00:51:09,070 --> 00:51:10,240
this is the sort of data you get, 

704
00:51:10,680 --> 00:51:16,810
okay? 

705
00:51:17,700 --> 00:51:18,050


706
00:51:19,210 --> 00:51:20,190
Where 

707
00:51:20,600 --> 00:51:28,280
speaking loosely, you're given a data set, and I'm not gonna tell you what the right answer is on any of your data. I'm just gonna give you a data set and I'm gonna say, 

708
00:51:28,760 --> 00:51:31,050
"Would you please find interesting structure in this data set?" 

709
00:51:31,970 --> 00:51:35,780
So that's the unsupervised learning problem where you're sort of not given the right answer for everything.

710
00:51:36,400 --> 00:51:40,540
So, for example, an algorithm may 

711
00:51:42,230 --> 00:51:42,750
find 

712
00:51:43,750 --> 00:51:45,600
structure in the data in the form of 

713
00:51:46,190 --> 00:51:48,800
the data being partitioned into two clusters, 

714
00:51:49,280 --> 00:51:54,440
or clustering is sort of one example of an unsupervised learning problem.

715
00:51:55,340 --> 00:51:57,460
So 

716
00:51:58,930 --> 00:51:59,310
I hope you can see this. 

717
00:52:00,070 --> 00:52:01,390
It turns out that these sort of 

718
00:52:01,670 --> 00:52:04,860
unsupervised learning algorithms are also used in many problems. 

719
00:52:05,110 --> 00:52:09,340
This is a screen shot - this is a picture I got from Sue Emvee, who's a PhD student here,

720
00:52:09,870 --> 00:52:13,250
who is applying unsupervised learning algorithms to try to understand gene data, 

721
00:52:14,450 --> 00:52:19,380
so is trying to look at genes as individuals and group them into clusters based on ...

722
00:52:19,810 --> 00:52:24,960
based on properties of what genes they respond to - based on properties of how the genes respond to different experiments.

723
00:52:26,150 --> 00:52:29,650
Another interesting application of 

724
00:52:30,430 --> 00:52:35,970
[inaudible] sorts of clustering algorithms is actually image processing, this which I got from Steve Gules, who's another PhD student. 

725
00:52:36,590 --> 00:52:40,480
It turns out what you can do is if you give this sort of 

726
00:52:41,250 --> 00:52:44,770
data, say an image, to certain unsupervised learning algorithms, 

727
00:52:45,560 --> 00:52:48,070
they will then learn to group pixels together 

728
00:52:48,780 --> 00:52:54,060
and say, gee, this sort of pixel seems to belong together, and that sort of pixel seems to belong together.

729
00:52:54,750 --> 00:52:57,390
And so the images you see on the bottom 

730
00:52:57,940 --> 00:52:59,650
- I guess you can just barely see them on there - 

731
00:53:01,140 --> 00:53:04,700
so the images you see on the bottom are 

732
00:53:05,820 --> 00:53:09,860
groupings - are what the algorithm has done to group certain pixels together. 

733
00:53:10,320 --> 00:53:14,050
On a small display, it might be easier to just look at the image on the right. 

734
00:53:14,530 --> 00:53:15,680
The two images on the bottom 

735
00:53:16,250 --> 00:53:20,860
are two sort of identical visualizations of the same grouping ...

736
00:53:21,200 --> 00:53:22,940
grouping of the pixels 

737
00:53:23,510 --> 00:53:24,840
into [inaudible|contiguous] regions.

738
00:53:25,640 --> 00:53:26,060


739
00:53:26,660 --> 00:53:30,820
And so it turns out that this sort of clustering algorithm or this sort of unsupervised learning algorithm, 

740
00:53:31,370 --> 00:53:33,300
which learns to group pixels together, 

741
00:53:33,820 --> 00:53:38,870
it turns out to be useful for many applications in vision, in computer vision image processing.

742
00:53:39,620 --> 00:53:44,890
I'll just show you one example, and this is a rather cool one that two students, Ashutosh Saxena and Min Sun here did, 

743
00:53:45,510 --> 00:53:47,020
which is given an image like this, 

744
00:53:47,460 --> 00:53:50,290
right? This is actually a picture taken of the Stanford campus. 

745
00:53:50,830 --> 00:53:56,060
You can apply that sort of clustering algorithm and group the picture into regions. 

746
00:53:56,800 --> 00:53:57,520
Let me actually 

747
00:54:00,510 --> 00:54:02,060
blow that up so that you can see it more clearly. Okay. 

748
00:54:20,750 --> 00:54:26,940
So in the middle, you see the lines sort of grouping the image together, grouping the image into [inaudible|contiguous] regions.

749
00:54:27,440 --> 00:54:31,720
And what Ashutosh and Min did was they then applied the learning algorithm to say 

750
00:54:32,280 --> 00:54:34,150
can we take this clustering 

751
00:54:34,910 --> 00:54:35,460
and use it 

752
00:54:36,320 --> 00:54:40,340
to build a 3D model of the world? 

753
00:54:41,810 --> 00:54:44,680
And so using the clustering, 

754
00:54:47,520 --> 00:54:50,070
they then had a learning algorithm try to learn 

755
00:54:51,620 --> 00:54:53,640
what the 3D structure of the world looks like 

756
00:54:56,580 --> 00:55:00,520
so that they could come up with a 3D model that you can sort of fly through, okay? 

757
00:55:05,180 --> 00:55:10,460
Although many people used to think it's not possible to take a single image and build a 3D model, 

758
00:55:13,020 --> 00:55:13,410
but 

759
00:55:15,040 --> 00:55:18,270
using a learning algorithm and that sort of clustering algorithm is the first step. 

760
00:55:23,730 --> 00:55:24,150
They were able to.

761
00:55:24,700 --> 00:55:30,250
I'll just show you one more example. I like this because it's a picture of Stanford with our beautiful Stanford campus. 

762
00:55:31,140 --> 00:55:37,920
So again, taking the same sort of clustering algorithms, taking the same sort of unsupervised learning algorithm, you can group the pixels into different regions. 

763
00:55:38,780 --> 00:55:40,770
And using that as a pre-processing step, 

764
00:55:42,300 --> 00:55:43,020
they eventually built 

765
00:55:44,280 --> 00:55:49,720
this sort of 3D model of Stanford campus in a single picture. You can sort of walk into the ceiling, 

766
00:55:51,860 --> 00:55:52,140
look around the campus. Okay? 

767
00:55:57,460 --> 00:56:00,160
This actually turned out to be a mix of supervised and unsupervised learning, 

768
00:56:00,470 --> 00:56:05,240
but the unsupervised learning, this sort of clustering was the first step.

769
00:56:06,920 --> 00:56:07,650
So it turns out 

770
00:56:10,800 --> 00:56:11,020
there ...

771
00:56:11,780 --> 00:56:16,840
these sorts of unsupervised - clustering algorithms are actually routinely used for many different problems, 

772
00:56:17,310 --> 00:56:21,110
things like organizing computing clusters, social network analysis, 

773
00:56:21,710 --> 00:56:29,760
market segmentation, so if you're a marketer and you want to divide your market into different segments or different groups of people to market to them separately; 

774
00:56:30,280 --> 00:56:35,800
even for astronomical data analysis and understanding how galaxies are formed. These are 

775
00:56:36,260 --> 00:56:40,990
just a sort of small sample of the applications of unsupervised learning algorithms and clustering algorithms 

776
00:56:41,690 --> 00:56:42,890
that we'll talk about later in this class.

777
00:56:43,740 --> 00:56:44,100


778
00:56:44,960 --> 00:56:50,590
Just one particularly cool example of an unsupervised learning algorithm that I want to tell you about. 

779
00:56:51,470 --> 00:56:56,840
And to motivate that, I'm gonna tell you about what's called the cocktail party problem, which is 

780
00:56:57,470 --> 00:57:01,020
imagine that you're at some cocktail party and there are lots of people standing all over. 

781
00:57:01,510 --> 00:57:03,650
And you know how it is, right, if you're at a large party,

782
00:57:04,010 --> 00:57:07,990
everyone's talking, it can be sometimes very hard to hear even the person in front of you. 

783
00:57:08,330 --> 00:57:11,020
So imagine a large cocktail party with lots of people. 

784
00:57:11,500 --> 00:57:15,270
So the problem is, is that all of these people talking, can you

785
00:57:15,680 --> 00:57:19,640
separate out the voice of just the person you're interested in talking to 

786
00:57:20,260 --> 00:57:21,440
with all this loud background noise?

787
00:57:22,580 --> 00:57:23,710
So I'll show you 

788
00:57:24,230 --> 00:57:28,940
I'll show you a specific example in a second, but here's a cocktail party 

789
00:57:29,760 --> 00:57:33,710
that's I guess rather sparsely attended by just two people. But

790
00:57:34,830 --> 00:57:35,110


791
00:57:36,580 --> 00:57:39,690
what we're gonna do is we'll put two microphones in the room, 

792
00:57:40,430 --> 00:57:43,940
okay? And so because the microphones 

793
00:57:44,670 --> 00:57:49,520
are just at slightly different distances to the two people, and the two people may speak in slightly different volumes, 

794
00:57:50,010 --> 00:57:50,820
each microphone 

795
00:57:51,170 --> 00:57:55,280
will pick up an overlapping combination of these two people's voices, 

796
00:57:55,960 --> 00:57:58,720
so slightly different overlapping voices. 

797
00:57:59,170 --> 00:58:02,850
So Speaker 1's voice may be more loud on Microphone 1, and Speaker 2's

798
00:58:03,400 --> 00:58:05,150
voice may be louder on Microphone 2, whatever.

799
00:58:05,710 --> 00:58:10,440
But the question is, given these microphone recordings, can you separate out the original speaker's voices? 

800
00:58:11,820 --> 00:58:12,080
So 

801
00:58:13,300 --> 00:58:19,270
I'm gonna play some audio clips that were collected by Tai Yuan Lee at UCSD. 

802
00:58:19,710 --> 00:58:20,850
Let me show ...

803
00:58:21,600 --> 00:58:28,030
I'm gonna actually play for you the original raw microphone recordings from this cocktail party. So this is the 

804
00:58:29,070 --> 00:58:29,860
Microphone 1:

805
00:58:30,880 --> 00:58:34,920
*Microphone 1: *One, two, three, four, five, six,... *Microphone 2: *Uno, dos, tres, cuatro, cinco, seis, 

806
00:58:35,380 --> 00:58:41,840
*Microphone 1: seven, eight, nine, ten. *Microphone 2: siete, ocho, nueve, diez. *Instructor (Andrew Ng) : *So it's a fascinating cocktail party with people counting from one to ten. 

807
00:58:43,030 --> 00:58:44,930
This is the second microphone: *Microphone 1: *One, two, *Microphone 2: *Uno, dos, 

808
00:58:45,250 --> 00:58:50,750
*Microphone 1: three, four, five, six, seven, eight, nine, ten. *Microphone 2: tres, cuatro, cinco, seis, siete, ocho, nueve, diez.

809
00:58:52,570 --> 00:59:00,180
*Instructor (Andrew Ng) : *Okay. So in supervised learning, we don't know what the right answer is, right? So what we're going to do is take exactly the two microphone recordings you just heard 

810
00:59:00,850 --> 00:59:05,110
and give it to an unsupervised learning algorithm and tell the algorithm 

811
00:59:06,060 --> 00:59:09,680
which of these discover structure in the data [inaudible] or what structure is there in this data? 

812
00:59:10,550 --> 00:59:17,390
And we actually don't know what the right answer is offhand. So give this data to an unsupervised learning algorithm, 

813
00:59:18,240 --> 00:59:20,800
and what the algorithm does in this case, it will discover that 

814
00:59:21,400 --> 00:59:26,880
this data can actually be explained by two independent speakers speaking at the same time, 

815
00:59:27,580 --> 00:59:28,420
and it can further 

816
00:59:28,950 --> 00:59:35,170
separate out the two speakers for you. So here's Output 1 of the algorithm: *Microphone 1: One, two, three, four, 

817
00:59:35,520 --> 00:59:36,650
five, six, 

818
00:59:37,020 --> 00:59:40,220
 seven, eight, nine, ten.

819
00:59:40,630 --> 00:59:45,480
*Instructor (Andrew Ng) : *And there's the second algorithm: *Microphone 2: Uno, dos, tres, cuatro, cinco, seis, 

820
00:59:45,940 --> 00:59:48,480
siete, ocho, nueve, diez.

821
00:59:49,090 --> 00:59:51,000
*Instructor (Andrew Ng): *And so the algorithm discovers that, 

822
00:59:51,490 --> 00:59:56,580
gee, the structure underlying the data is really that there are two sources of sound, and here they are. 

823
00:59:57,540 --> 00:59:59,890
I'll show you one more example. This is a,

824
01:00:00,800 --> 01:00:03,320
well, this is a second sort of 

825
01:00:04,040 --> 01:00:12,920
different pair of microphone recordings: *Microphone 1: *One, two, three, four, five, six, seven, eight, nine, ten. *Microphone 2: *[Music playing.]

826
01:00:13,290 --> 01:00:15,940
*Instructor (Andrew Ng): *So the poor guy is not at a cocktail party. He's talking to his radio. 

827
01:00:16,500 --> 01:00:20,560
There's the second recording: *Microphone 1: *One, two, three, four, five, *Microphone 2: *[Music playing.]

828
01:00:20,890 --> 01:00:24,610
*Microphone 1: *six, seven, eight, nine, ten. *Microphone 2: *[Music playing.]

829
01:00:24,880 --> 01:00:28,510
*Instructor (Andrew Ng) : *Right. And we get this data. It's the same unsupervised learning algorithm. 

830
01:00:28,800 --> 01:00:33,270
The algorithm is actually called independent component analysis, and later in this quarter, you'll see why. 

831
01:00:33,900 --> 01:00:35,700
And then output's the following:

832
01:00:36,140 --> 01:00:41,600
*Microphone 1: * One, *two, three, four, five, six, seven, eight, nine, ten.

833
01:00:42,410 --> 01:00:50,060
*Instructor (Andrew Ng): *And that's the second one: *Microphone 2: *[Music playing.]

834
01:00:51,560 --> 01:00:56,400
*Instructor (Andrew Ng): *Okay. So it turns out that beyond solving the cocktail party algorithm, 

835
01:00:56,800 --> 01:01:02,430
this specific class of unsupervised learning algorithms are also applied to a bunch of other problems, 

836
01:01:02,820 --> 01:01:03,880
like in text processing or 

837
01:01:04,230 --> 01:01:10,830
understanding functional grading and machine data, like the magneto-encephalogram would be an EEG data. We'll talk about that more 

838
01:01:11,300 --> 01:01:11,820
when we go and 

839
01:01:12,310 --> 01:01:16,730
describe ICA or independent component analysis algorithms, which is what you just saw.

840
01:01:17,330 --> 01:01:17,640


841
01:01:18,420 --> 01:01:20,490
And as an aside, 

842
01:01:21,620 --> 01:01:23,630
this algorithm I just showed you, it seems like it must be 

843
01:01:23,920 --> 01:01:29,450
a pretty complicated algorithm, right, to take this overlapping audio streams and separate them out. It sounds like a pretty complicated thing to do. 

844
01:01:30,330 --> 01:01:30,760
So you're gonna ask 

845
01:01:31,030 --> 01:01:33,850
how complicated is it really to implement an algorithm like this? 

846
01:01:34,870 --> 01:01:35,700
It turns out 

847
01:01:37,280 --> 01:01:40,300
if you do it in MATLAB, you can do it in one line of code.

848
01:01:41,370 --> 01:01:45,500
So I got this from Samuel Wyse at Toronto, U of Toronto, and 

849
01:01:46,260 --> 01:01:49,990
the example I showed you actually used a more complicated ICA algorithm than this.

850
01:01:50,290 --> 01:01:52,370
But nonetheless, I guess this is why 

851
01:01:53,050 --> 01:01:58,080
for this class I'm going to ask you to do most of your programming in MATLAB and Octave 

852
01:01:58,750 --> 01:02:03,210
because if you try to implement the same algorithm in C or Java or something, 

853
01:02:04,220 --> 01:02:05,390
I can tell you from personal, 

854
01:02:05,800 --> 01:02:10,570
painful experience, you end up writing pages and pages of code rather than relatively few lines of code. 

855
01:02:11,810 --> 01:02:17,330
I'll also mention that it did take researchers many, many years to come up with that one line of code, 

856
01:02:17,850 --> 01:02:19,580
so this is not easy.

857
01:02:20,290 --> 01:02:21,260


858
01:02:23,800 --> 01:02:29,370
So that was unsupervised learning, and then the last of the four major topics I wanna tell you about is 

859
01:02:29,750 --> 01:02:31,470
reinforcement learning. 

860
01:02:32,130 --> 01:02:38,700
And this refers to problems where you don't do one-shot decision-making. So, for example, 

861
01:02:39,230 --> 01:02:42,390
in the supervised learning cancer prediction problem, 

862
01:02:42,910 --> 01:02:44,530
you have a patient come in, 

863
01:02:44,910 --> 01:02:51,250
you predict that the cancer is malignant or benign. And then based on your prediction, maybe the patient lives or dies, and then that's it, right? 

864
01:02:51,600 --> 01:02:55,120
So you make a decision and then there's a consequence. You either got it right or wrong. 

865
01:02:55,710 --> 01:03:01,850
In reinforcement learning problems, you are usually asked to make a sequence of decisions over time.

866
01:03:02,920 --> 01:03:03,670
So, for example, 

867
01:03:04,460 --> 01:03:07,170
this is something that my students and I work on. 

868
01:03:07,660 --> 01:03:14,430
If I give you the keys to an autonomous helicopter - we actually have this helicopter here at Stanford, -

869
01:03:14,910 --> 01:03:16,580
how do you write a program to make it fly, right? 

870
01:03:17,220 --> 01:03:20,310
You notice that if you make a wrong decision on a helicopter, 

871
01:03:20,760 --> 01:03:21,050
[you know,]

872
01:03:21,350 --> 01:03:27,810
the consequence of crashing it may not happen until much later. And in fact, usually you need to make a whole sequence of bad decisions to crash a helicopter. 

873
01:03:28,650 --> 01:03:33,910
But conversely, you also need to make a whole sequence of good decisions in order to fly a helicopter really well.

874
01:03:34,920 --> 01:03:35,970
So 

875
01:03:36,890 --> 01:03:41,480
I'm gonna show you some fun videos of learning algorithms flying helicopters. 

876
01:03:42,110 --> 01:03:46,160
This is a video of our helicopter at Stanford flying 

877
01:03:46,590 --> 01:03:49,720
using a controller that was learned using a reinforcement learning algorithm.

878
01:03:50,610 --> 01:03:52,620


879
01:03:52,830 --> 01:03:56,840
So this was done on the Stanford football field, and we'll zoom out the camera in a second. 

880
01:03:57,230 --> 01:03:59,870
You'll sort of see the trees planted in the sky. 

881
01:04:00,730 --> 01:04:02,830
So

882
01:04:04,470 --> 01:04:05,370


883
01:04:05,760 --> 01:04:12,560
maybe this is one of the most difficult aerobatic maneuvers flown on any helicopter 

884
01:04:13,030 --> 01:04:14,160
under computer control. 

885
01:04:14,620 --> 01:04:19,540
And this controller, which is very, very hard for a human to sit down and write out, 

886
01:04:19,950 --> 01:04:22,370
was learned using one of these reinforcement learning algorithms.

887
01:04:23,020 --> 01:04:23,270


888
01:04:24,050 --> 01:04:29,880
Just a word about that: The basic idea behind a reinforcement learning algorithm is this idea of what's called a reward function. 

889
01:04:30,780 --> 01:04:34,120
What we have to think about is imagine you're trying to train a dog. 

890
01:04:34,780 --> 01:04:39,530
So every time your dog does something good, you say, "Good dog," and you reward the dog. 

891
01:04:40,460 --> 01:04:42,900
Every time your dog does something bad, you go, "Bad dog," right?

892
01:04:43,950 --> 01:04:49,730
And hopefully, over time, your dog will learn to do the right things to get more of the positive rewards, to get more of 

893
01:04:50,130 --> 01:04:52,080
the "Good dogs" and to get fewer of the "Bad dogs.

894
01:04:53,150 --> 01:04:59,480
- So the way we teach a helicopter to fly or any of these robots is sort of the same thing. Every time the helicopter crashes, we go, "Bad helicopter," 

895
01:04:59,990 --> 01:05:00,210
and 

896
01:05:00,690 --> 01:05:05,730
every time it does the right thing, we go, "Good helicopter," and over time it learns how to control itself 

897
01:05:05,990 --> 01:05:08,320
so as to get more of these positive rewards.

898
01:05:08,730 --> 01:05:10,990
So reinforcement learning is - I think of it as a way 

899
01:05:11,460 --> 01:05:14,610
for you to specify what you want done, so you have to specify 

900
01:05:15,170 --> 01:05:17,540
what is a "good dog" and what is a "bad dog" behavior. 

901
01:05:18,140 --> 01:05:20,220
And then it's up to the learning algorithm to figure out 

902
01:05:20,660 --> 01:05:25,800
how to maximize the "good dog" reward signals and minimize the "bad dog" punishments.

903
01:05:26,500 --> 01:05:26,910


904
01:05:28,090 --> 01:05:29,570
[Just to show you]

905
01:05:30,210 --> 01:05:34,250
So it turns out reinforcement learning is applied to other problems in robotics. 

906
01:05:34,680 --> 01:05:39,860
It's applied to things in web crawling and so on. But it's just cool to show videos, so let me just show a bunch of them. 

907
01:05:40,480 --> 01:05:42,630
[This robot was actually...]

908
01:05:43,560 --> 01:05:47,220
This learning algorithm was actually implemented by our head TA, Zico, 

909
01:05:48,100 --> 01:05:53,550
of programming a four-legged dog. I guess Sam Shriver in this class also worked on the project 

910
01:05:53,800 --> 01:05:55,070
and Peter Renfrew and 

911
01:05:55,660 --> 01:05:58,520
Mike and a few others. 

912
01:06:01,180 --> 01:06:05,120
But I guess this really is a good dog/bad dog since it's a robot dog.

913
01:06:05,940 --> 01:06:06,820


914
01:06:07,560 --> 01:06:09,030
The second video on the right, 

915
01:06:09,670 --> 01:06:12,540
some of the students, I guess Peter, Zico, Tonca 

916
01:06:13,050 --> 01:06:18,900
working on a robotic snake, again using learning algorithms to teach a snake robot to climb over obstacles.

917
01:06:19,640 --> 01:06:23,320
Below that, this is kind of a fun example. 

918
01:06:24,170 --> 01:06:25,800
Ashutosh Saxena and Jeff Michaels 

919
01:06:26,310 --> 01:06:30,820
used learning algorithms to teach a car how to drive at reasonably high speeds off roads 

920
01:06:31,590 --> 01:06:33,520
avoiding obstacles.

921
01:06:36,040 --> 01:06:37,170


922
01:06:38,010 --> 01:06:42,710
And on the lower right, that's a robot programmed by PhD student Eva Roshen 

923
01:06:43,450 --> 01:06:48,030
to teach a sort of somewhat strangely configured robot 

924
01:06:48,730 --> 01:06:51,080
how to get on top of an obstacle, how to get over an obstacle. 

925
01:06:52,500 --> 01:06:54,600
Sorry. I know the video's kind of small. I hope you can sort of see it. 

926
01:06:56,570 --> 01:06:59,400
Okay? So I think all of these are robots that 

927
01:07:00,280 --> 01:07:05,600
I think are very difficult to hand-code a controller for by [learning|using] these sorts of learning algorithms. 

928
01:07:06,190 --> 01:07:08,630
You can in relatively short order 

929
01:07:09,360 --> 01:07:10,070


930
01:07:11,200 --> 01:07:13,330
get a robot to do often pretty amazing things.

931
01:07:24,250 --> 01:07:26,810
Okay. So 

932
01:07:28,240 --> 01:07:33,260
that was most of what I wanted to say today. Just a couple more last things, 

933
01:07:33,780 --> 01:07:35,540
but let me just check what questions you have right now. 

934
01:07:38,310 --> 01:07:41,230


935
01:07:44,100 --> 01:07:45,570


936
01:07:45,980 --> 01:07:51,650
So if there are no questions, I'll just close with two reminders, which are

937
01:07:53,040 --> 01:07:57,330
after class today or as you start to talk with other people in this class, 

938
01:07:57,770 --> 01:08:02,840
I just encourage you again to start to form project partners, to try to find project partners 

939
01:08:03,320 --> 01:08:04,250
to do your project with. 

940
01:08:04,590 --> 01:08:10,730
And also, this is a good time to start forming study groups, so either talk to your friends or post in the newsgroup, 

941
01:08:11,210 --> 01:08:12,530
but we just encourage you to 

942
01:08:13,120 --> 01:08:18,450
try to start to do both of those today, okay? Form study groups, and try to find two other project partners.

943
01:08:19,180 --> 01:08:19,800


944
01:08:20,260 --> 01:08:24,450
So thank you. I'm looking forward to teaching this class, and I'll see you in a couple of days.