机器学习 August 20, 2018

9-4逻辑回归中使用多项式特征

Words count 5k Reading time 5 mins.

import numpy as np
import matplotlib.pyplot as plt
np.random.seed(666)
X = np.random.normal(0,1,size...
Read article

机器学习 August 20, 2018

9-4逻辑回归中使用多项式特征

Words count 5k Reading time 5 mins.

import numpy as np
import matplotlib.pyplot as plt
np.random.seed(666)
X = np.random.normal(0,1,size=(200,2))
y = np.array(X[:,0]**2+X[:,1]**2 <1.5,dtype='int')
plt.scatter(X[y==0,0],X[y==0,1])
plt.scatter(X[y==1,0],X[y==1,1])
plt.show()...
Read article

机器学习 August 20, 2018

9-3 实现逻辑回归

Words count 2.7k Reading time 2 mins.

import numpy as np
import matplotlib.pyplot as plt
from sklearn import datasets

iris = datasets.load_iris()
X = iris.data
y = iris.target
X = X[y<2, :2]
y = y[y<2]
X.shape
(100, 2)
y.shape
(100,)
plt.scatter(X[y==0,0],X[y==0,1],color='...
Read article

机器学习 August 20, 2018

9-1逻辑回归算法 Logistic Regression

Words count 1.2k Reading time 1 mins.

解决分类问题

  • 回归问题怎么解决分类问题?
    将样本的特征和样本发生的概率联系起来,概率是一个数


import numpy as np
import matplotlib.pyplot as plt
def sigmoid(t):
    return 1 / (1 + np.exp(-t))
x = np.linspace(-10,10,500)
y = sigmoid(x)

plt.plot(x,y)
plt.show()


问题:

对于给定的样本数据集X,y,我们如何找到参数theta,使...

Read article

机器学习 August 20, 2018

6-6 LASSO Regularization

Words count 4.8k Reading time 4 mins.


Least Absolute Shrinkage and Selection Operator Regression

import numpy as np
import matplotlib.pyplot as plt

np.random.seed(42)
x = np.random.uniform(-3.0,3.0,size=100)
X = x.reshape(-1,1)
y = 0.5 * x + 3 + np.random.normal(0,1,size=100)

plt.scatter...
Read article

机器学习 August 20, 2018

6-5 模型正则化 Regularization

Words count 3.9k Reading time 4 mins.

模型正则化:限制参数的大小

import numpy as np
import matplotlib.pyplot as plt
np.random.seed(42)
x = np.random.uniform(-3.0,3.0,size=100)
X = x.reshape(-1,1)
y = 0.5 * x + 3 + np.random.normal(0,1,size=100)
plt.scatter(x, y)
plt.show()

# 使用多项式回归

from sklearn....
Read article

机器学习 August 20, 2018

6-4 偏差方差权衡 Bias Variance Trade off

Words count 1.1k Reading time 1 mins.

模型误差 = 偏差(Bias)+ 方差(Variance)+ 不可避免的误差


偏差和方差通常是矛盾的:降低偏差,会提高方差,降低方差会提高偏差

机器学习的主要挑战,来自于方差

解决高方差的通常手段

  1. 降低模型复杂度
  2. 减少数据维度;降噪
  3. 增加样本数
  4. 使用验证集
  5. 模型正则化
Read article

机器学习 August 20, 2018

6-3 验证数据与交叉验证

Words count 5.4k Reading time 5 mins.

解决方法是将数据分成三份,训练数据、验证数据、测试数据

import numpy as np
from sklearn import datasets
/Users/shirukai/anaconda3/lib/python3.6/importlib/_bootstrap.py:219: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
  return...
Read article

机器学习 August 20, 2018

5-4 使用PCA进行降噪

Words count 3.1k Reading time 3 mins.

import numpy as np
import matplotlib.pyplot as plt
X = np.empty((100,2))
X[:,0] = np.random.uniform(0.,100.,size=100)
X[:,1] = 0.75 * X[:,0] + 3. + np.random.normal(0,5,size=100)
plt.scatter(X[:,0],X[:,1])
plt.show()

from sklearn.decomposition ...
Read article

机器学习 August 20, 2018

5-1 多元线性回归

Words count 11k Reading time 10 mins.



import numpy as np
from sklearn.metrics import r2_score


class LinearRegression:
    def __init__(self):
        """初始化Linear Regression模型"""
        self.coef_ = None  # 系数
        self.intercept_ = None  # 截距
        self._theta = None

    def fit_normal(self...
Read article
Load more
0%