Coefficient Of Determination

 

Hello everybody,

Today I want to describe some ideas about measure quality of learning. 

First of all I want to point areas where you can apply those measurements. It can be in three areas:

  1. For setting funtional during learning
  2. For picking hyperparameters
  3. For evaluation of ready made model

Another way can be combination. You can measure quality during learning with one measurement, but final model you can analyze with other measurement. 

MSE

So, let's start with most common formula: mean squared error:

In words it reads the following: difference between prognozed value and desired value, squared, summed and finally averaged. 

MSE has following featues:

  1. Easily minimizable
  2. Punishes stronger for bigger mistakes

What it means in practice? If your learning data set has many anomalies, then MSE is definetely not a point to apply. But if your data is without anomalies then MSE is really good choice. Because MSE will learn anomalies. And you don't want your model to learn anomalies, right?

MAE

Another point of choice for data scientists is mean averaged error:

In words it is averaged module of difference between desired output and actual output.

It has following features:

  1. Harder to minimize
  2. Less punishes for bigger mistakes

In practice it means that if your learning set has plenty of anomalies then MAE is one of the functions to consider.

Coefficient of detemination

Means squared error has interesting modification, coefficient of determination. Take a look at formula:

where  is average answer value

The main part of coefficient is fracition, in which numerator has sum of squared deviations, while denominator has sum of deviations of answers. 

So, what coefficient of determination explains? It explains which part of dispersion is explained or modeled in whole dispersion of answers. It's value is also interpretable. 

It has following features:

For workable models coefficient of detemrnation is between zero and 1. 

If coefficient of detemrnation is equal to 1 then we built ideal model.

If coefficient of detemrnation is equal to zero, then model is like constant value

If coefficient of detemrnation is smaller then zero then model is worse then constant value.

Asymetric error

Consider following scenario. You are owner of shop that sells laptops. And you have following question mark for yourselves: what amount of laptops to preorder. Another question mark that you get is maybe it's better to have little bit more laptops then needed? For such a cases maybe you can consider stronger punishment for under forecast then over forecast. One of the example of functions that can be used is quantile error. Take a look at formula:

It looks pretty complicated, so let's go in some details of it.

Parameter τ ∈ [0, 1] defines for what to punish stronger: for over forecast or under forecast. 

If τ is closer to 1 then model will be punished for under forecaset, otherwise for over forecast. 

If this formula looks complicated below goes visual explanation:

So,

step #1 calculate difference between desired output and model output

step #2 choose multiplier

step #3 in case of under forecast we multiply by τ - 1 and summ it. In case of over forecast multiply on τ.

No Comments

Add a Comment
 

 

Types Of Learning In Ai

 

Hello everybody,

today I want to make a short notice on question that I often receive: what kinds of learning exists in Machine Learning. I want to provide simple answer:

  1. Learning with teacher: questions and answers.
  2. Learning with teacher: just questions
  3. Partial learning: questions, and some of them with answers
  4. Active learning: at which object you'll get an answer

I suppose maybe some variations of this also exists, but as usually ML in one or another way manipulates with those four. 

Consider example of clusterization. You have some data set of values. And you need somehow to group them or in other words find groups of similar objects. This task has two problems: nobody know number of groups. And second one we don't know real clusters. Real clusters that we want to distinguish. That provides the challenge that you by yourself can't evaluate answer from ML algorithm. While reading this example you may wonder who may need such task? I give you few examples:

  • Segmentation of users for mobile network operator or for some e-shop
  • Search for similar users in social networks
  • Serch in genome for similar profiles of expression

Second example can be task of visualization of some group. Imagine that you have some set of data, and want not just to know how much groups you have, but want to have some visual representation of it. 

Third example is search for anomalies. Consider the following scenario. You have successfull web site that has many visitors: tens of thousands. And among them you can have 1 - 2 hackers. Or don't have hackers at all. How to detect them? This is also learning without teacher.

No Comments

 

Add a Comment
 

 

Machine Learning Certificate From Coursera

 

Hello everybody,

I want to boast that I receved certificate from Standfor Univercity about my level of knowledge in Machine Learning.

Here is the link:

https://www.coursera.org/maestro/api/certificate/get_certificate?course_id=973756

and here is screenshot:

No Comments

Add a Comment
 

 

Principal Component Analysis In Machine Learning

 

Hello everybody,

today I want to note important for me details of Machine Learning. 

So, the first and very important usage of PCA is visualizing data. If you have 10 dimensions, can you visualize those data? If you can I'm happy about you, but I can't. I can imagine only 1, 2, 3 D :). But with principal componenet analysis it's possible to visualize data. 

Second application is reducing memory/disk need to store data. That's quite self-explanatory, to train on 10 000 dimensions and 100 dimensions is different.

Third is speeding up learning algorithm. It's actually related with second.

Another important detail, it's bad idea to use PCA in order to avoid overfitting. Actually everybody who does machine learning knows that decreasing number of features increases chances of overfitting.

No Comments

 

Add a Comment
 
Tags:, , , 

Debugging Learning Algorithm

 

Few notes of how to debug machine learning algorithm:

  1. Get more training examples
  2. Try smaller sets of features
  3. Try getting additional features
  4. Try adding polinomial features
  5. Try increasing lambda
  6. Try decreasing lambda

What those trys can achieve:

  1. fixes high variance
  2. fixes high variance
  3. fixes high bias
  4. fixes high bias
  5. fixes high bias
  6. fixes high variance

1 Comment

  • Alex Turok said

    Hi Yuriy,

    I've just stumbled upon your blog. A cool mix of machine learning and Acumatica development insights! That's really great, keep it up!

 

Add a Comment