sds podcast episode 309: learning through competition · 2019. 11. 3. · convolutional neural...
TRANSCRIPT
Kirill Eremenko: This is episode number 309 with the legendary data
science instructor, Jose Portilla.
Kirill Eremenko: Welcome to the SuperDataScience Podcast. My name
is Kirill Eremenko, Data Science Coach and Lifestyle
Entrepreneur. And each week, we bring you inspiring
people and ideas to help you build your successful
career in data science. Thanks for being here today,
and now let's make the complex simple.
Kirill Eremenko: This episode is brought to you by my very own book,
Confident Data Skills. This is not your average data
science book. This is a holistic view of data science
with lots of practical applications.
Kirill Eremenko: The whole five steps of the data science process are
covered from asking the question to data preparation,
to analysis, to visualization, and presentation. Plus,
you get career tips ranging from how to approach
interviews, get mentors and master soft skills in the
workplace.
Kirill Eremenko: This book contains over 18 case studies of real world
applications of data science. It comes off with
algorithms such as Random Forest, K Nearest
Neighbors, Naive Bayes, Logistic Regression, K-means
Clustering, Thompson sampling, and more.
Kirill Eremenko: However, the best part is yet to come. The best part is
that this book has absolutely zero code. So, how can a
data science book have zero code? Well, easy. We focus
on the intuition behind the data science algorithms, so
you actually understand them, so you feel them
through, and the practical applications. You get plenty
of case studies, plenty of examples of them being
applied.
Kirill Eremenko: And the code is something that you can pick up very
easily once you understand how these things work.
And the benefit of that is that you don't have to sit in
front of a computer to read this book. You can read
this book on a train, on a plane, on a park bench, in
your bed before going to sleep. It's that simple even
though it covers very interesting and sometimes
advanced topics at the same time.
Kirill Eremenko: And check this out. I'm very proud to announce that
we have dozens of five star reviews on Amazon and
Goodreads. This book is even used at UCSD,
University of California San Diego to teach one of their
data science courses. So, if you pick up Confident
Data Skills, you'll be in good company.
Kirill Eremenko: So, to sum up, if you're looking for an exciting and
thought provoking book on data science, you can get
your copy of Confident Data Skills today on Amazon.
It's a purple book. It's hard to miss. And once you get
your copy on Amazon, make sure to head on over to
www.confidentdataskills.com where you can redeem
some additional bonuses and goodies just for buying
the book.
Kirill Eremenko: Make sure not to forget that step is absolutely free. It's
included with your purchase of the book, but you do
need to let us know that you bought it. So, once again,
the book is called Confident Data Skills and the
website is confidentdataskills.com. Thanks for
checking it out, and I'm sure you'll enjoy.
Kirill Eremenko: Welcome back to the SuperDataScience Podcast.
Ladies and gentlemen, super pumped to have you
back here on this very special episode of the
SuperDataScience Podcast, because today, we have
none other but the legendary data science instructor,
Jose Portilla.
Kirill Eremenko: Very interesting episode. You're probably wondering
why we recorded it together since we're direct
competitors in the online education space in data
science. Well, we'll answer that question for you right
at the start of the episode. And we thought you'd be
interested to have us both in the same room talking
about your favorite topics such as AI, data science,
and the future of the world.
Kirill Eremenko: So, in this episode, you will hear about neural
networks that create other neural networks, how that
all works and what that means for data scientists.
How to manage and lead a community of over a million
students.
Kirill Eremenko: The question that Jose gets asked the most, as you
can imagine with such large communities, we get
hundreds, I think it's like 500 or so questions per day
that are asked in our courses. And here, you'll find out
what is the most asked question for Jose and how he
answers it.
Kirill Eremenko: You'll also hear about the pyramid of learning and
what is the pinnacle of learning what you need to do in
order to understand that you have indeed mastered a
topic. And finally, we're going to have a very interesting
debate about artificial general intelligence.
Kirill Eremenko: I really enjoyed chatting to Jose and I can't wait for
you to hear this podcast. So without further ado, I
bring to you the legendary data science instructor,
Jose Portilla.
Kirill Eremenko: Welcome back to SuperDataScience Podcast ladies and
gentlemen, super special guest on the show with me
today, Jose Portilla. Jose, how are you going?
Jose Portilla: Good. Good to be here. You said it right in front of me
as if there was an audience, but we're in an empty
room.
Kirill Eremenko: I know. You got to do it. Man, where are we, Jose?
Jose Portilla: We are at Udemy LIVE in Berlin, in Germany.
Kirill Eremenko: In Berlin. Out of all places ...
Jose Portilla: I know, right?
Kirill Eremenko: ... and we're back in Berlin. What a great party last
night.
Jose Portilla: Yeah. It's fantastic. You had the Birddogs, was it?
Kirill Eremenko: Birddogs. Yeah, if anybody is interested in some cool
cover band in Berlin, check out Birddogs. They were
epic.
Jose Portilla: Yeah, they're fantastic.
Kirill Eremenko: Yeah, Udemy knows how to throw a party.
Jose Portilla: That's very true.
Kirill Eremenko: Yeah. Like a lot of food, a lot of drinks and the
excursion on Friday was really cool.
Jose Portilla: Yeah, the boat tour and then the Boros Collection and
then all that stuff.
Kirill Eremenko: In the bunker that's above ground.
Jose Portilla: I was a little more interested in the building than the
... I don't know. What do you think of ...
Kirill Eremenko: It really was cool.
Jose Portilla: This is getting off topic, but what did you think of the
artwork?
Kirill Eremenko: The artwork, I never understood contemporary art.
Like post modernism. But, what I really liked in this
tour was that they explained it, and that allowed me ...
For example, that one of the trampoline and the arrow,
and the horse. Compared to Picasso, that took five
minutes to put together.
Kirill Eremenko: Maybe it took ages for the person, but it's not ... You
can't really compare that to classic art. It's just
different realms. But with the way they explained is it's
not about what the artwork is, it's about what it
represents, what the person was thinking and kind of
like the idea that they're provoking you to think about.
Kirill Eremenko: And when you think about it that way, it's like
somebody writing down an idea with pen and paper.
But here, they're just doing it with like sketches or
household goods or whatever else. And in that way
that for me, that was much easier to accept. Yeah. So
in that sense, I like the explanations. What about you?
Jose Portilla: That's actually the thing I dislike the most about it. In
my opinion, it's like, if your artwork is so reliant on a
third party to happen to explain your thesis behind the
artwork, maybe the art is not the best manifestation of
trying to get your message across. Maybe you should
just be writing a paper on whatever topic and it may
be clear to more people. But some of them were like
crazy. Like the one of the images of the houses.
Remember the 9x9?
Kirill Eremenko: Oh, yeah.
Jose Portilla: So, the thing was this ... I guess to explain it to the
listener a little bit. Apparently, there used to be this
old German company that would fly around in a
helicopter, take aerial photos of your home, then go
door to door and try to sell you an aerial photo of your
house. And apparently, it's so very popular in the 70s
to have a little aerial photo of your house. And then
Google Maps comes along and they go out of business.
Jose Portilla: So, they have 30,000 essentially stock images that
they did not end up selling, because not everybody
wanted to buy a picture of their house. And then they
gave it to this artist and he manually, instead of the
convolutional neural network or some filter, he just
looked for patterns. So, then he gets like the nine
images where everyone is washing their car, or the
nine images where all the windows are boarded up in
these houses.
Kirill Eremenko: Yeah. And puts them into one big frame or the entire
collection near each other, and then you have to guess
the name like car wash.
Jose Portilla: Yeah, you have to figure out what's the same or similar
track between all these paintings and images.
Kirill Eremenko: Yeah, definitely some interesting ideas. But fair point
on maybe it's not the best way if you need explanation.
Speaking of the building. So, it's a bunker above
ground from World War II, with two or three meter
thick ceilings and walls. Did they tell you in your
group that the bunker like there's an actually a living
resident above ...
Jose Portilla: Yes. The whole building is insane. Because you look at
it from the outside. Yeah, it's like concrete, very
industrial or brutalism looking. And I thought to
myself like, "That's weird, bunkers are usually ... I
thought they were underground." And I was like, "I'm
surprised this could have survived the Berlin
bombings."
Jose Portilla: And then you go inside and they show you how thick
the walls are. You're like, "Oh, this could survive
anything, because they're hugely thick." Yeah. Then
later of the tour, the owners of the collection live at the
top floor of this bunker. It's so weird.
Kirill Eremenko: And they explained to us how they managed to do that
because in Berlin, you're not ... You want to tell that
story?
Jose Portilla: You probably remember better than I do in some weird
legal thing, right?
Kirill Eremenko: Yeah. In Berlin, you're not allowed to legally add an
extra floor on top of a building that already exists. And
this was a bunker. They don't want to live in the
bunker. They wanted to live, add a floor on top. But
the legal loophole was that bunkers ... This building
doesn't fall under the classification of a house.
Kirill Eremenko: It falls under the classification of a bunker, and
bunkers are normally underground, so everything that
we see above ground in this case is considered the
basement. Basement one, basement two, basement
three, basement four. So, they were like, "Oh, we got to
add a top level. We kind of live in a basement."
Something like that. So, that was really fun. So, are
you having a good time in Berlin overall?
Jose Portilla: Yeah, it's been great to ... I've just been traveling
around Europe for this. Yeah. So, it's been nice to get
to see everybody.
Kirill Eremenko: Very nice. Yeah. Well, today's podcast. First of all,
some of our students who know us both will be ...
Jose Portilla: Their minds blown that we're talking to each other.
Kirill Eremenko: Yeah. It would be like thinking, "What? Did the world
turn around?" Because we are apparently like ... Well,
we're competitors. We compete. Fierce competitors at
each other's throats. So, how would you explain that?
Why are we talking more, let alone recording this
podcast if we're such fierce competitors?
Jose Portilla: It's so funny. Well, we've had this conversation on
multiple times, but everyone from the outside thinks
like one of us has to die in order for the other to
survive.
Kirill Eremenko: Hunger Games. Yeah.
Jose Portilla: Yeah. Exactly. But if anything, it's the opposite.
Specifically like at Udemy where ... I don't know. Some
people think like, "Oh, you probably wish that your
competitors come up with really bad courses or
something. That way your courses can reign supreme."
Jose Portilla: When, in fact, the opposite is true, because the worst
thing that can happen to me is that a popular
competitor releases a bad course. Because then
students think, "Oh, even just online education in
general, it wouldn't be that great."
Jose Portilla: Suddenly, it becomes a reflection of not just one
course but their entire online learning experience. So,
one of the best things that comes to me is have a
competitor like you with good content. And then it's
like I was telling you earlier, buying a course is not like
buying a car where you buy one car and then many
years later, you're not buying a car until much further
into the future.
Jose Portilla: It's more like buying a book on a topic you like. You're
going to buy multiple books by multiple people. So, the
best thing that come to me is have a competitor with a
good course, which engages a student and then says,
like, "Oh, I can actually learn some of this complex
stuff online. Let me go check out other courses, etc."
So, yeah, it's not some Hunger Games situation.
Kirill Eremenko: Yeah. For sure. And also, engage one course, tell my
friends about it. They'll come and different people like
different styles. You and I have different styles of
teaching, inevitably.
Jose Portilla: Yeah.
Kirill Eremenko: Everybody is unique and somebody might prefer the
way you explain something. Somebody might prefer
the way I ... Or somebody might benefit from both.
Jose Portilla: I was just about to say, different people like both
styles. I would say that the Venn diagram of our
crossover students is huge.
Kirill Eremenko: Yeah.
Jose Portilla: Yeah.
Kirill Eremenko: For sure. And also, what I like about the competition is
it doesn't let you lack off. I mean either of us, because
we hold each other up to a standard. If there was just
one of us, then the standard drops. You, first of all,
might not notice that your standard of teaching is
dropping. Students might not notice, because they
have nothing to compare it to. And then you won't be
incentivized to improve. I like this that like I can't let
my standards drop, you can't let your standards drop,
because the nature of the competitive market.
Jose Portilla: Yeah. The quality of course is getting better. I don't
know if you remember your first course. Ever look
back at it?
Kirill Eremenko: Yeah, I have. Oh, it was so [crosstalk 00:13:40].
Jose Portilla: I'm so embarrassed how bad my first courses are.
Kirill Eremenko: I know. It's like night and day.
Jose Portilla: Yeah.
Kirill Eremenko: I do appreciate the effort I put in. Listening back to it,
it took so much courage to start recording.
Jose Portilla: I know. I believe we might not even start doing it,
right?
Kirill Eremenko: Yeah.
Jose Portilla: Because you're by yourself in a room recording it not
knowing if anyone will ever even view this. And I'm
shocked that ... I don't know, it's almost like a different
person made that course, because it's like, "I can't
believe I did this."
Kirill Eremenko: Oh, I can be as grateful to that person who I was for
making that leap. That was good. Okay. So now, move
that out of the way. I don't know, let's maybe talk
about what are some of the recent trends, some of the
recent things that you're seeing in the data science AI
industry that you're creating courses on that students
are excited about?
Jose Portilla: Well, let's see. Recent trends. There's always new
updates to the various deep learning libraries. So, like
TensorFlow 2.O just came out. Like just, just came
out. Maybe like a couple weeks ago maybe.
Kirill Eremenko: No. I think it was in June.
Jose Portilla: Well, that was the beta or alpha, right?
Kirill Eremenko: Oh, yeah.
Jose Portilla: I mean, the official 2.0 release was pretty recent. And
then PyTorch 1.0 also came out really recently.
Kirill Eremenko: Okay. Very cool.
Jose Portilla: So, those are some new things. The new libraries has
always been developed. Maybe this might not be such
a new trend that I recently saw the publication date of
this paper, but I just recently found out about this was
the neural architecture search or NAS by Google, or
Google AI where they're basically using recurrent
neural networks to create or search for optimized
architectures for different problems. So, like the
CIFAR-10 dataset, with the 32x32 colored images of 10
different topics like plain, frogs, whatever.
Kirill Eremenko: 60,000 images there.
Jose Portilla: Yeah. What they're doing is they're basically deciding
that humans, since we design everything in a very
structured way, like convolutional neural networks are
very structured with the kernels, everything is still
kind of squared, connected. That perhaps there is
some more organic, more optimized connection.
Jose Portilla: So, they're using a recurrent neural network to
actually build the architecture of another network to
solve for the CIFAR-10 dataset problem. And they were
able to actually improve the performance quite a bit
from whatever the state-of-the-art convolutional neural
network could do.
Jose Portilla: And this is with a network of essentially what looks
like to the human eye randomized connections, and
they can even skip layers and stuff. And so that one
really blew my mind to the fact that I used to think
now like, "Oh, the future is like recurrent neural
networks or the feature is convolutional neural
networks." When probably the reality is the future is
some unknown random network that another network
has figured out. That's almost like the ... What is it?
Like the I am Robot or ...
Kirill Eremenko: I am Robot. Yeah.
Jose Portilla: Yeah, the [crosstalk 00:16:53].
Kirill Eremenko: I, Robot.
Jose Portilla: I, Robot? Where you have robots building robots. Now,
we have neural networks building other neural
networks.
Kirill Eremenko: That's really cool. And then you can go deeper than
neural networks building neural networks [crosstalk
00:17:05].
Jose Portilla: Yeah. Then the other thing then is like a loop. Almost
like have a neural network build a neural network for
finding neural networks. What's the most optimized
thing? Yeah, that one really blew my mind, because it
really showed that the shape of the actual network
seems to have some quite a bit more importance than
the weights.
Jose Portilla: And it's not something I think ... Well, this was
published in 2017. Now, people, I'm sure, are really
thinking about it. But definitely just five years ago, I
don't think that many people were thinking about if a
randomized neural network would actually perform
better than a structured one, given the same
randomized initialization of weights.
Kirill Eremenko: Yeah. Interesting, because you sent me that paper. I
had to look through it. First of all, I was surprised. I
was like, "Yeah, this is 2017." But still, it also, as you
said, blew my mind that you have, from scratch, this
neural network that they created to create new neural
networks was building them from absolutely zero and
outperforming by a small margin like 0.09%
performance and 1.05% faster than the human ones,
but still outperforming them on the CIFAR, right?
Jose Portilla: Yeah, CIFAR-10. Yeah.
Kirill Eremenko: CIFAR-10 dataset. That was really cool. The way I
understood it, the way it works is it takes the neural
network and that is building or wants to build and
represents it as a variable length string. So, it puts it
into a text string basically. The representation of the
neural network. And then it iterates through that
string through what was ... Gradient descent, right?
Jose Portilla: Mm-hmm (affirmative).
Kirill Eremenko: To optimize for the accuracy of the image prediction. Is
that about right?
Jose Portilla: Yeah, basically. Yeah. I think maybe the reason I
found that about it so recently was I recently, this year
for sure, even though maybe it was published two,
three years ago. I recently saw the pictures of the
neural network architectures that the RNN was
actually solving for.
Jose Portilla: And it was the weirdest looking. I mean, it looked like
a little kid drew sloppy lines with random neurons
everywhere. Nothing was even. You would expect
maybe the RNN would find some sort of hidden
structure, right? But it was just ...
Kirill Eremenko: Unstructured.
Jose Portilla: Yeah, for better or for worse, it looks more like an
organic brain. Like an actual biological brain, right?
Kirill Eremenko: That's so cool. You got to send me those images,
because [crosstalk 00:19:53].
Jose Portilla: Yeah, I'll have to find them. You look at them and it's
like there's no way this performs better than a
structured network.
Kirill Eremenko: Have you ever seen those images of when certain parts
of a building or an airplane instead of a human
designing them, they get an AI to build it?
Jose Portilla: Oh, yeah.
Kirill Eremenko: Through reinforcement learning. And it's completely
weird, completely random. Like simple parts that hold
... You know that part under a table that holds the legs
of the table to the main part of the table?
Jose Portilla: Yeah.
Kirill Eremenko: Like 90 degree type of angle metal thing. Like if you get
an AI to design, it looks completely randomly weird.
And it's like 30% lighter, 100% stronger, less material
required. It looks very organic.
Jose Portilla: Yeah, I remember I was once in a museum. And one of
the exhibits was an antenna that was designed by a ...
It wasn't technically AI. It was like a genetic algorithm
that try to keep solving for what kind of antenna could
get the strongest signal.
Jose Portilla: And the antenna looks so weird. It looked like a string
of spaghetti like floating in space or something. And it
was like, "Yeah, this is what the algorithm figured
would get the strongest signal in this particular spot."
Jose Portilla: And it just goes to show that it's really hard to have
intuition for some of this stuff. And it kind of makes
sense. I don't know. The more you study evolution and
biology, certain animals are super weird. Like you see
a platypus, or like a squid has a beak like a bird. It's
so bizarre, but I don't know. Nature is essentially a
really long reinforcement learning algorithm, where it's
like many, many generations, what works, what
doesn't work.
Kirill Eremenko: Yeah. But what I find interesting. I was also thinking
about it just know that at the same time in nature, a
lot of things are symmetrical.
Jose Portilla: Yeah, right?
Kirill Eremenko: As weird as they are, they're symmetrical, but what AI
designs most of the time is as asymmetrical. There's
kind of a combination of both in nature.
Jose Portilla: Yeah. And then not to get too philosophical, but then
you see certain numbers keep popping up in nature
like a pi or something.
Kirill Eremenko: Oh, the Fibonacci number.
Jose Portilla: Yeah. Or the fact that definition of a normal
distribution, the actual function for it has pi in there.
It blows my mind. How is this freaking number
showing up everywhere and things that you wouldn't
think it would show up? You wouldn't think that
relationship of a circle would have much to do for
normal distribution. But then it happens [crosstalk
00:22:28].
Kirill Eremenko: And then everything follows. Like the heights of
humans. I don't know, populations of animals,
bacteria. A lot of things are normally distributed in
this world.
Jose Portilla: Yeah. I don't know. There may be some deeper order to
things that we're just not getting, but yeah. Yeah, like I
said, you see a platypus and you're like, "There must
be some random noise here."
Kirill Eremenko: Crazy. All right. Well, shifting gears a little bit. You
teach online. And by the way, congratulate 1.2 million
students.
Jose Portilla: Yeah. Well, congratulations to you too, to the both of
us, I guess.
Kirill Eremenko: It's crazy. How does it make you feel? 1.2 million
students worldwide.
Jose Portilla: Oh, it feels bizarre. I remember thinking like a long
time ago, like, "Man, when I hit 100,000 students, that
will be it. I would have hit the ultimate goal. And then
you hit that, and then you've hit it too. And then you
think, "Okay. 250,000 students, let's really go for it."
Some crazy goal. Then you hit that and you're like,
"Oh, okay, half a million." Yeah, it's just been
absolutely insane how fast everything has been
growing just in a couple of years.
Kirill Eremenko: Yeah, it is very fast. We're, I think, at 920,000
[crosstalk 00:23:45].
Jose Portilla: Yeah. I bet you, if we had this same conversation even
just some weeks from now, you would have had a
million as well.
Kirill Eremenko: Yeah. Probably.
Jose Portilla: Next time I see you, for sure, you'll have at least a
million, if not much more.
Kirill Eremenko: Yeah. That puts a lot of responsibility, right? You got
to create the right content. The right guidance is no
longer just fun and games and just putting out there
like things that you're passionate about. But you also
got to think through what do people need? What do
your 1.2 million students need? What are their
requirements?
Kirill Eremenko: You got to think about the needs of the students. I
guess my question to you is how do you go about that?
How do you go about communicating with your
audience and finding out what is it that you can help
them with the most in this next stage of your journey?
Jose Portilla: That's an interesting question. It's almost like as we've
been progressing through this online education world
and this population of students, the analogies keep
changing. So, at first, it was like, "Okay, I can
structure myself as if I teach a course, like a
classroom of 30 students."
Jose Portilla: Then it starts getting too big. It's like, "Okay. Well,
now, I'll structure myself like a seminar." So, maybe I'll
have a set piece of notes for students like they
wouldn't a large seminar class. Less one on one
interaction.
Jose Portilla: Then it starts getting bigger, it's like, "Well, I guess
now I'm structuring myself as a department end of
college." So now, I have TAs or something and much
more standardized practice across multiple courses or
something like that.
Jose Portilla: And these are structuring yourself as a university or
something. So then now you have multiple
departments of like, "Oh, Python topics, or R topics or
Tableau topics, etc. And then there's some sort of
structure within those, etc.
Jose Portilla: And now with our scale, it's almost like the analogy
becomes like a city or something. So, then you have to
start thinking of ... At this point, one-on-one
interaction as much as I love it is kind of impossible.
We can't communicate with every citizen of us that we
have a million people, right?
Jose Portilla: So, then you start trying to think of what does a city
do. So, they may have meetups. So then we try to have
different sources or students to interact with each
other. That's maybe a little more fluid. And this is
something maybe you can have advice for me, because
I know you're probably better at this than I am.
Jose Portilla: But just trying to build that sense of community.
Maybe off of you to me, because the Q&A forms, for
interaction purposes from one student to another, isn't
exactly optimized. First, we tried Slack. That quickly
got unscalable, because we couldn't pay for every
student, and it deletes the history.
Jose Portilla: Then we tried Gitter, which is kind of like this Slack
based off GitHub, but that was also trying to have
scaling problems. And then we switch to Discord,
which I hadn't really heard of it before until someone
suggested it to me. And it's like for online gaming. Do
you know what I mean?
Kirill Eremenko: Yeah, I've heard of them.
Jose Portilla: Yeah. So, it's a free version of essentially what Slack
does. And so, so far, that's what we're using to try to
help scale a sense of community. Yeah, and they can
do things like ... Well, like I said, you're probably
better at this than I am of things like a podcast or
something, to build a sense of community or some sort
of weekly updates, that kind of thing where ...
Jose Portilla: You're not going to be able to talk to each student, but
at least you can try to encourage students talking to
one another. So, I think as we scale larger, trying to
encourage the student interaction is one of my
priorities.
Kirill Eremenko: Yeah, I absolutely agree. I wouldn't say much better
than you at this. At SuperDataScience, we're also
exploring things. So, right now, we are trying out the
Slack approach that you've already tried. We're also
considering an approach of forums, an approach of
building our own system because our whole LMS at
SuperDataScience, the learning management system is
completely custom built by ourselves.
Jose Portilla: Yeah. [Murray 00:28:03] told me that.
Kirill Eremenko: And so we can add on whatever we want to just like ...
We just need to see that there's a need for it and
there's time. But in general, I completely agree with
you that as much as well, I want to interact with
everybody. I simply physically cannot do that. And
therefore, putting people into groups to talk to each
other. That's the best.
Kirill Eremenko: I'll give you an example. I was at DataScienceGO, the
conference we run in San Diego. I was running a
workshop on Tableau. And there's, I think, like 60
people in the room. All different levels. And I said right
away, "This is a workshop for beginners. If you're
advanced, there's another workshop in this
neighboring room about AI ethics. Go there. You get a
lot of value out of that. This is a workshop for
beginners."
Kirill Eremenko: I think one people changed the room. But still, there
were a lot of different levels here. Very advanced
people, beginner people. While we were going through
these exercises on building this dashboard, some
people are going really fast ahead and I thought, "What
are you doing in this room? I told you go to the other
one." And they're like, "Yeah. No, I just wanted to play
around, see what the dashboard will be like, see what
the dataset will be like."
Kirill Eremenko: And so what I started doing is said, "All right, if you
went ahead, like far ahead, why don't you get up and
help somebody who's falling behind? There's 60
people, which is not a million, but still, I can't go help
everybody."
Kirill Eremenko: And so the more advanced people, like I remember
specifically Jonathan and Ogo. If they're listening to
this huge thank you for that. They just got up and
helped out a lot of people. And there were others as
well. And in that sense, nobody was bored. Everybody
was keeping up.
Kirill Eremenko: And I think that sense of community is amazing in
data science. Data scientists want to help each other.
Our job is to facilitate that and find the best way. It
looks like we're both exploring to find like what is the
right medium for this community to thrive.
Jose Portilla: Yeah. I don't know. It almost sounds douchey to say
this, but we really are pioneers in this space, because
there's no one else we can really talk to of like, "How
do you deal with the community of students this
large?" Where you don't have some university or
company level team to handle all of it. So, we have
explore these different methods.
Jose Portilla: And the other thing I was going to say about the
students interact with each other. I think students get
a lot out of it as far as the ... There's some more official
term for it, but like the pyramid of knowledge or the
steps to really understanding a topic.
Jose Portilla: Like the very final step is teaching a topic. So, you
know you understand something if you're able to teach
it. So, I think it helps the students to help other
students because then they know that they really
understand the topic if they're able to help out another
student in it.
Kirill Eremenko: That's a great way of putting it.
Jose Portilla: Yeah. There's some official term for this that someone
will have to Google that there's a hierarchy of
understanding. And the very last or top level is the
ability to teach it. It has some sort of proper noun
name whoever discovered it.
Kirill Eremenko: Okay. Yeah, I think I've heard of this well before, but it
doesn't come to top of head, but I agree with you.
Yeah.
Jose Portilla: Although I teach stuff, I feel like I don't understand
crap. Even though I teach them.
Kirill Eremenko: Why did you love it?
Jose Portilla: Because it's like a new thing. Every five seconds in this
freaking field.
Kirill Eremenko: Oh, yeah.
Jose Portilla: But actually, I was going to say that might be one of
the more positive aspects of the field we work in is that
the libraries are so new sometimes. And because if you
are the world's expert in TensorFlow 2.0 and you are
not a developer at Google that was actually working on
it, the amount of total experience you can have at this
moment in time, is that most like one or two years,
right?
Jose Portilla: Technically, it's based on Keras, so you could kind of
have more experience. But for something like PyTorch
1.0 as well, the most experience you could possibly
have to be the world's expert is just a few years versus
like calculus or whatever. It was around since you
were born, so you could have a lot more experience in
it.
Jose Portilla: And I think because in this field, so many people
remember what it's like to be a beginner, because it
was not that long ago that they were a beginner
themselves just by the nature of the field. They don't
mind helping out, because it was not too long ago
themselves that they knew nothing about like
TensorFlow or PyTorch.
Jose Portilla: So, I think that definitely helps out. Just a sense of
community that for whatever reason, data science and
Python has, versus some other ... Not to disparage
other communities, but like ...
Kirill Eremenko: Consulting.
Jose Portilla: Yeah, like consulting or some people in JavaScript or
web development that's been around much longer like
HTML, CSS and JavaScript. There's definitely an
attitude of like, "Oh, you don't get this? Whatever."
Jose Portilla: Because they've had enough time with it like 15 or 20
years since web 1.0 that it's probably faded from their
mind of what it's like to be a beginner versus Python
and data science. That the libraries are constantly
being updated, and there's a new library every year so
to speak. Everyone remembers what it's like to be a
beginner, so they don't mind as much helping out.
Kirill Eremenko: Got you. Is your community mostly beginners? What
did you ...
Jose Portilla: That's a great question of the general skill level, the
community. It depends how you define beginner,
because they come from all walks of life, right?
Kirill Eremenko: Yeah.
Jose Portilla: So, there's people that, yeah, they've never
programmed before. But they'll have a PhD in ...
Kirill Eremenko: Psychology.
Jose Portilla: Yeah, psychology or something. So, they're not
beginners in the sense that they're beginners at
learning, because this person is clearly able to be self
motivated and teach themselves complex topics. It's
just that they didn't take a Python class in university
because it wasn't taught there for them.
Jose Portilla: And then there's other people that they already work
at AWS or something or they're already working at
Google, and their boss just said, "Oh, I need you to
learn this esoteric library in Python or R or whatever."
Jose Portilla: And then they're definitely not beginners and they ...
For them, it's almost like they just need to pick and
choose certain lectures from the courses of like, "Oh,
let me quickly just learn this couple things my boss
told me to learn." I think, yeah, the majority of our
"beginners" ...
Kirill Eremenko: Like newcomers to data science.
Jose Portilla: Exactly, yeah. They're not beginners in the sense that
they don't know anything. They usually have some
sort of expertise in a field outside of data science or
programming. And I think it kind of attracts that mind
that you are already technically adept at something. It
makes you interested in the possibilities of leveraging
data science in Python with your current skillset.
Kirill Eremenko: Definitely. That's something we're also seeing, I think,
because of all ... Between 60% and 70% are
newcomers to data science. Whether just college
students or transitioning into data science. And then
about 20 or so percent are more advanced
practitioners. And then about 10% are managers,
executives, entrepreneurs. But what I find interesting
is that over time ... Because we've been doing this for
years. How long have you been teaching?
Jose Portilla: Since March 2015.
Kirill Eremenko: 2015. I started on Udemy in 2014, but in data science,
it was, I think, June 2015. And so similar timeline,
right?
Jose Portilla: Yeah.
Kirill Eremenko: And so over that ... That was, what, four years. I've
seen people grow from beginner to intermediate to
almost advanced practitioner level. I've seen people get
jobs and so on. And it's really cool to see this growth
and especially if you get to meet them in person. That
is just fantastic.
Kirill Eremenko: They're like, "Oh, I remember you three years ago, you
were like asking these questions and you were just
starting out into your journey or transitioning from
whatever other career you had. And now, you're a data
scientist. You're coaching others. People are asking
you for advice." That is so inspiring.
Jose Portilla: It blows my mind sometimes, like the careers that
some of my students have been able to get. I was just
talking to someone recently who ended up becoming a
senior developer for AWS. I start thinking to myself,
"Would I be able to get that job? I don't think I would."
Given the interview process and how hard it is.
Jose Portilla: And they're like, "Oh, thank you. Your course helped
me that so much." I was like, "I don't know if I could
do your job." It blows my mind when you see students
getting jobs that like, "I don't think, I would probably
fail that interview if I wasn't really practicing for it."
Yeah. So, it's crazy the growth of the students and how
fast everything has been going just in the past few
years.
Kirill Eremenko: That's absolutely true. What's the most common
question that students ask you?
Jose Portilla: Where do I find the notes?
Kirill Eremenko: You get like hundreds of questions. Like we both get
hundreds of questions.
Jose Portilla: Yeah. Well, there's certain questions that's just like ...
It's also a bit of a selection bias of the kind of person
that asks a question on forums or something. It is
usually a person who has not done a quick Stack
Overflow search or something.
Jose Portilla: But beyond that, beyond little silly questions like that,
maybe one of the most common questions I get is like,
"How do I choose a machine learning model?" One
thing I do is I point them straight to the ... You know
scikit-learn. They have their choosing an estimator
diagram. It's like this weird, ugly little bubble tree
chart. That's like, "Oh, if you have this many data
samples, choose this. If you're trying to do
unsupervised or supervised, do that."
Jose Portilla: So, I point at that chart, but then I also tell them that
realistically, for some of these models, it's difficult to
have an intuition for them. Once you deal with them a
lot, then you can be like, "Oh, I think you should do
this. Blah-blah." If you're about to do in SVM, there's
not that many people that would be have a strong
intuition of what the exact C value or gamma value
should be, right? They pretty much always just do a
grid search. And the same for choosing a model. You
usually run a couple and see what performs best or
then make a combination of models.
Jose Portilla: And I think a lot of students sometimes go into it
thinking like, "By the end of this, I will know exactly
what model to choose in any situation." When
realistically, you're still going to have to test out
different models. And I think it's hard to convey to
students that even after you are extremely
knowledgeable on this topic, when it comes to a new
problem, you're still just going to have to do what
everyone else does, explore the unknown, not really
know what's the best model.
Jose Portilla: So, you can be the world's top expert. At the end of the
day, when it comes to a new problem, you're still going
to have to kind of guess and check almost. Which is
kind of, to bring it back, exactly what that neural
architecture search is doing, right? Keep guessing and
checking until you find the good fit for the good model.
Kirill Eremenko: Or what AutoML is designed to do.
Jose Portilla: Exactly. Yeah.
Kirill Eremenko: Do you think AutoML will replace data scientists?
Jose Portilla: That is such a good question, because I used to think
like, "Oh, crap. Maybe we're going to be out of a job."
Especially this robot building robots and models
building models. What's left for us?
Jose Portilla: I don't know. I think what is defined as a simple
problem keeps expanding as you go throughout time.
Because something like a linear regression task many,
many years ago, that's [goaltending 00:39:37]. It's just
beginning to figure it out. That's an extremely hard
problem. How do I find the line of best fit?
Jose Portilla: Now, that's an extremely easy problem. So, I don't
think it will replace data scientists or machine learning
practitioners. It will just basically push them to harder
problems and reclassify things as easily solvable
problems or easier problems for something to be
automated against.
Kirill Eremenko: Absolutely. And I think there's always going to be room
for human creativity in these aspects. At least for the
next 10 years.
Jose Portilla: Yeah. Then you see the neural networks that are
painting and the recurrent neural networks that are
doing text generation, like character. I'm sure you've
read that blog post of the unreasonable effectiveness of
recurrent neural networks.
Kirill Eremenko: Oh, yeah.
Jose Portilla: [crosstalk 00:40:27]. And it's writing out Shakespeare.
Kirill Eremenko: Yeah, that's an old blog post book.
Jose Portilla: That's a very old blog post.
Kirill Eremenko: Like 2015 or something. A really good one as well.
Jose Portilla: Fantastic. And it always blows my mind that the
network is doing it character by character, not word by
word. The fact that you can even read it blows my
mind.
Kirill Eremenko: Have you seen ... There's a movie that they filmed
based on the script created by [crosstalk 00:40:51].
Jose Portilla: I have heard of it. I definitely have not seen it.
Kirill Eremenko: What is it called? I forgot. Solar something. I'll link to
it in the show notes and I'll send it to you as well. It is
ridiculous. They got Middleton. So, the actor from
Silicon Valley. You know that TV show?
Jose Portilla: Oh, yeah.
Kirill Eremenko: Jeff Middleton or something.
Jose Portilla: I forget his name, but yeah, I know what you're talking
about.
Kirill Eremenko: Yeah. And then they got him to act the main role and
it's like a whole script written by this neural network
that even gave itself ... It's been a while. I forgot. It
called itself Barney. It called itself a name. It's like a
30 or 15 minute long short movie. It was on the
London Film Festival I think.
Kirill Eremenko: The sentences themselves make sense by what people
say in the movie, but overall it's complete nonsense,
but they still acted it out in a way that you get like
sure goosebumps down your spine like, "Wow. This is
a space saga of a love story in it." It's pretty funny.
Jose Portilla: Yeah, it's crazy, because it's clear that the networks
are able to easily conquer now like things like
grammar. It will just take a deeper network to conquer
something like plot, right? I don't know if you're a ...
This was maybe within the past year. OpenAI created
basically a model to produce text articles.
Kirill Eremenko: No, I don't know.
Jose Portilla: Yeah, that was really interesting because they did not
release the full model because they thought it was too
dangerous, because they basically ... With a seed
sentence of Syria blah-blah-blah.
Jose Portilla: Suddenly, this model could generate a full ... It was
essentially like fake news text article that read
perfectly. That really read someone had written it
personally. And it was just completely made up by a
network. And they decided it was so good at generating
fake news style articles that they refuse to release the
full network.
Kirill Eremenko: That is crazy.
Jose Portilla: Yeah.
Kirill Eremenko: This kind of reminds me the story of CRISPR. The lady
that developed CRISPR for adjusting genes. As soon as
it came out of the lab was like ... If I'm not mistaken,
she was like, "This is very dangerous for the world.
What have we created?"
Jose Portilla: Yeah. It's almost like a milestone of you know a
technology is really good and really worth pursuing if
it's always like this double edged sword. Something
like atomic science, right? Like you have this really
interesting aspect of nuclear energy. And at a certain
reactor, it's like a thorium reactor, whatever, has the
potential for very low nuclear waste and you're
conquering the atom itself, like what the universe is
built out of.
Jose Portilla: On the other hand, you also have the ability to create
a nuclear weapon. And I think it's like that for
anything. You have convolutional neural networks that
can detect cancer or skin cancer better than any
doctor could. But then, at the same time, you could
abuse these networks to then begin racial profiling
based off corrupt datasets.
Jose Portilla: Yeah, any technology I think has the ability to be
exploited for good or bad. But at least it's a good signal
that you're onto something. Like CRISPR, like you
were saying, if you see a child with a birth defect or
something, the fact that you could maybe fix it
preemptively is fantastic. But then, should you be able
to choose the color of your baby's eyes? Maybe not so
much? I don't know.
Jose Portilla: Then there's also the ethical questions. The ethics of it
is something that ... I don't know. That will take a long
time to catch up to the technology.
Kirill Eremenko: For sure. What do you think? How far are we from
AGI?
Jose Portilla: It's funny. I was just having a conversation with
someone about this here at Udemy LIVE. Every time I
get asked this question, my timeline becomes shorter.
I remember when I was first asked the question, I was
like, "Never." And then I start building out networks
myself.
Jose Portilla: The one that really convinced me was the very first
couple years ago, when I really built up my first good
text generation network. I was like, "Oh, this is way
more effective than I thought it was going to be." I felt
like I don't know what I'm doing, and I'm actually able
to do something that could fool a person.
Kirill Eremenko: Imagine someone who knows what they're doing and
then ...
Jose Portilla: That's exactly what ...
Kirill Eremenko: Just steal them.
Jose Portilla: Yeah. And there's people way smarter than me working
on stuff that's way harder than this. Will it be in my
lifetime? I don't know. I definitely not believe that it
will be reached ... It's inevitable now at a certain point
in humanity there will be general AI, the singularity or
whatever.
Jose Portilla: Will it happen in my lifetime? I don't know. Hopefully
an old man on my deathbed, maybe it will become
more clear like, "Oh, yeah, in a couple years, we'll
[crosstalk 00:45:55]."
Kirill Eremenko: Oh, man, I think 100% in our lifetime. What's his
name? Ray Kurzweil. 2025 or 2029, that's the year.
And then 2050 is when AI becomes super intelligent,
like surpasses humans, and so on. Petitions for its
independent rights, and things like that. A classic
example is I think why we mistake it is because we're
used to linear thinking, and this stuff is happened
exponential ...
Jose Portilla: That's exponential. That's true. Yeah.
Kirill Eremenko: A great question I ask people. How far do you think
you'll get from here where we sitting in 30 linear
steps? You know that one?
Jose Portilla: Yeah. Versus like the [crosstalk 00:46:37].
Kirill Eremenko: 30 exponential steps. Ridiculous. Ridiculous. I have
the same feeling. As every year passes, my timeline
gets close, so like my expectation for this.
Jose Portilla: I think it may also be out of selfishness that I hope it
doesn't happen in my lifetime where it's like ...
Because then, I think something that's going to
happen is it's going to be a real question of what does
it mean to have consciousness? And what does it
mean to actually be human?
Jose Portilla: Because when it's replicated completely artificially, it's
going to be something that humans are going to have
to grapple with, and that's a very tough thing to think
about of, "Now, what does it mean to be human, have
a fulfilled life, have consciousness when this computer
has essentially all the same things?" Right?
Kirill Eremenko: Yeah. What's the difference? How can we discriminate
against them? And now all of a sudden, they're also
conscious.
Jose Portilla: Yeah.
Kirill Eremenko: Why do we consider ourselves better?
Jose Portilla: Exactly. Will they be second tier citizens when they're
actually smarter than us only because we created
them? Suddenly have some sort of power of them. Will
they live with us at the same level? This is a question
for someone much smarter than me to answer or think
about.
Kirill Eremenko: Some of the AI scientists or futurists think that our
generation and the next generation are the final
generations of humans who are here. I think it was
Elon Musk saying that we are biological ... What is it
called? What's the thing that starts computer? Boot
sequence or something. Like pre-boot, or whatever.
Kirill Eremenko: I forgot the word, but basically like this. When the
computer starts, there's a part that has to go first and
then boot up the rest of the computer on the
motherboard. That's like biological way to boot AI to
get it started. And then as soon as it started, we're no
longer need it. We were just a phase in evolution that,
"Okay, now we've created AI." Boom, the end. And then
from then on, this new species, artificial intelligence,
robots, whatever, are going to take over the planet and
so on.
Jose Portilla: Yeah. And it makes you think if any civilization across
the cosmos, if it's some sort of inevitable conclusion
that once some organic system evolves enough, they
create artificial intelligence as the next step.
Kirill Eremenko: That's interesting. Quite possibly. What's his name?
And the interesting thing about AI is listening to a
podcast with Ben Goertzel recently is that it won't be
us individually. We always are individually, and we try
to ... We strive for the sense of community. We want to
be on our phones all the time, Instagram. We want to
think as a collective mind, but it's hard for us, because
the way we do it is through phones and that's very
inefficient, very slow. Whereas AI is going to be hooked
up to the Internet.
Kirill Eremenko: So, it's not going to be individual AI. They're all going
to be in one big mega mind. It's like whenever you see
an AI, whether it's robots or a program whatever else,
they're all going to be thinking the same thing and
exchanging knowledge. And so therefore, for them, for
us, one day is going to be one day. For them, it's going
to be like one day is like 100,000 years in their
collective mind. And so they're going to evolve super
fast.
Jose Portilla: Yeah. We're so limited by our monkey brains of what
consciousness even means, right? When in reality,
once general AI is achieved, it's ... Maybe superior is
not the right word, but whatever their consciousness
is will be a higher level than what we are able to
achieve as some organic system. I mean, it'll almost be
like godlike.
Kirill Eremenko: That's the thing. There's a great article on Wait But
Why where he explains the latter of consciousness. Try
to explain to an ant in ant language what a monkey is.
Like no freaking way.
Kirill Eremenko: Try to explain to a monkey in monkey language what a
human is or like what these moving things in the sky
are, which are airplanes. This is going to think as
stars. And same thing for us. Why do we think we're
the ultimate pinnacle of consciousness?
Kirill Eremenko: There is a level above us, which we'll never be able to
comprehend just simply because of the nature of how
our brains work and limitations. There's no way we
can ever understand. I think AI, I really think that it
will get to that level where it will be looking at us as
ants.
Jose Portilla: As ants. Yeah. No, for sure. I don't know. It's a
testament to our ignorance. When we think of AI, we
think of it as a copy of a human. What it really will be
like we created some superior God that will hopefully
be benevolent to us.
Kirill Eremenko: Benevolent. Yeah.
Jose Portilla: I'm very glad to be part of this generation though that
still doesn't have it. The questions that come up of
when general AI does exist are things that I'm glad
that I don't have to think about.
Kirill Eremenko: A very interesting time to be alive.
Jose Portilla: For sure. Yeah.
Kirill Eremenko: All right. Well, Jose, thank you so much for coming on
the show.
Jose Portilla: Thank you for having me.
Kirill Eremenko: What a pleasure. Where can our students or listeners
find you, connect with you, take your courses, follow
your career?
Jose Portilla: So, probably the easiest way is just if you Google
search my name, Jose Portilla, the first thing that pops
up is probably my Udemy page. So, you can always
check that out, my profile page in Udemy for different
courses. You can feel free to connect with me on
LinkedIn. Again, that's probably like the second link
on Google.
Jose Portilla: Or you can check out pieriandata.com. That's our little
company for data science stuff, but just Google Jose
Portilla and I'm maybe too accessible. You can easily
contact me either on LinkedIn or messaging on
Udemy.
Kirill Eremenko: Fantastic. And we'll definitely include those links in
the show notes. Pierian Data, by the way, we didn't
talk about this, but I want to give a shout out that you
do corporate trainings. So, if anybody is interested in
corporate trainings, check out Pierian Data. I heard
fantastic things about you.
Jose Portilla: Thank you very much.
Kirill Eremenko: Definitely have a look at that. On that note, we
probably better get back to the conference and great
chatting in person. I'm glad we did this.
Jose Portilla: Yeah, likewise. We'll have to do this again.
Kirill Eremenko: There you have it ladies and gentlemen. That was Jose
Portilla. I really hope you enjoyed this conversation as
much as I did, and we're super grateful for you being
part of this episode.
Kirill Eremenko: My favorite part by far was the conversation about
neural networks creating neural networks. That indeed
could be the future that we're heading towards where
AI builds AI, which builds AI, which builds AI, and so
on. And then we will live in a world that we probably
wouldn't even recognize today. And then we will live in
a world that we probably wouldn't even recognize
today.
Kirill Eremenko: As always, the show notes for this episode are
available at superdatascience.com/309. That's
superdatascience.com/309. There, you can find the
transcript for this episode, any materials, research
papers, images that we mentioned on this episode.
And, of course, the URLs for Jose's LinkedIn website
and Udemy profile where you can find all of his
courses.
Kirill Eremenko: I highly encourage everybody to check out Jose's
courses on Udemy. And if you or your company are
interested in in-person corporate trainings, Jose is
doing a great job in that space. You can find him at his
website, pieriandata.com.
Kirill Eremenko: On that note, if you enjoyed this episode, forward it to
somebody you know, somebody who's passionate
about data science, analytics, AI, machine learning,
somebody who's learning these things online, or
maybe somebody who's already following Jose, and
you know that they love him and would love to hear
from him on this podcast.
Kirill Eremenko: It's very easy to share the episode. Just send the link,
superdatascience.com/309. On that note, thank you
so much for being here today. I really appreciate your
time, and I look forward to seeing you back here next
time. Until then, happy analyzing.