郑中基为什么娶余思敏| 拉不出来屎是什么原因| 端字五行属什么| 前列腺炎有些什么症状| 拔罐什么时候拔最好| 头晕呕吐挂什么科| 喝三七粉有什么好处| 对牛弹琴是什么意思| 密度是什么| 讲义是什么| 胆经不通吃什么中成药| 19属什么| 三元及第是什么意思| 什么颜色加什么颜色是黑色| 芡实有什么功效| 一厢情愿指什么生肖| 医是什么结构| 靠山是什么意思| 拖拖拉拉什么意思| 什么地大喊| 不举是什么意思| 鼻子出油多是什么原因| 游离甲状腺素偏低是什么意思| 地级市副市长是什么级别| 什么止咳最好| 血红蛋白偏高是什么意思| 三七主要治什么病| a是什么| 史字五行属什么| 类风湿关节炎吃什么药| 嗓子痛吃什么药好得快| 猫有什么病会传染给人| sp是什么意思啊| 妊娠囊是什么| 肠系膜淋巴结炎吃什么药最有效| 左肾尿盐结晶是什么意思| 今年28岁属什么| 曲奇饼干为什么不成形| 衔接是什么意思| 微信为什么加不了好友| 肠炎有什么症状| 沙门氏菌用什么药| tp是什么意思| 拔智齿后需要注意什么| 老子叫什么名字| 白细胞加号什么意思| 遥遥相望是什么意思| 哭笑不得是什么意思| 防晒衣什么面料好| 妇科菌群失调吃什么药| 什么自行车最贵| 湿气重吃什么药最有效| 5月30日是什么星座| 肥达氏反应检查什么病| 什么然有序| 五十八岁属什么生肖| 农历三月三是什么日子| 垫脚石是什么意思| 想改名字需要什么手续| 侧面是什么意思| 焖是什么意思| 三和大神是什么意思| 吃什么对心脏供血好| 左行气右行血什么意思| 香仪是什么意思| 舌头两边锯齿状是什么原因| 胆固醇高吃什么最好| 8五行属什么| 龟头责是什么意思| 腺管瘤是什么| 诸葛亮属相是什么生肖| 四川九寨沟什么时候去最好| 眼睛充血吃什么药| 五味子有什么作用| 黔驴技穷什么意思| 长期喝豆浆有什么好处和坏处| 尿酸偏高是什么病| 过敏是什么样子的| 深海鱼油起什么作用| 燕窝什么时候吃最好| 核准日期是什么意思| 眼睛干涩模糊用什么药| 11月20号什么星座| 96199是什么电话| 正月十五是什么节| 草字头加果念什么| elaine是什么意思| 骨折喝酒有什么影响吗| 夏雨什么| 游弋是什么意思| 退职是什么意思| 款式是什么意思| 痤疮涂什么药膏| 猪肚炒什么好吃| 五心烦热是什么症状| 一什么春雷| 妈妈的表姐叫什么| 小腿麻木是什么原因| 凤凰男是什么意思| 月抛是什么意思| 江诗丹顿属于什么档次| 脚底红润是什么原因| 心电图p波代表什么| 皮肤过敏用什么药| 1963年属兔的是什么命| 内心丰盈是什么意思| 油脂是什么| 狗狗为什么会得细小| 景泰蓝是什么地方的特种工艺| 114514什么意思| 什么洗发水去屑好| 董事长是什么职位| 九月七日是什么星座| 衣衫褴褛是什么意思| 嗓子疼有痰吃什么药| 越吃越瘦是什么原因| 脸痒痒是什么原因| 苏州市长什么级别| 川崎病是什么病| 原籍是什么意思| 查过敏源挂什么科| 肉苁蓉有什么功效| msgm是什么品牌| 做梦牙齿掉了是什么预兆| 探花是什么意思| 金刚菩提是什么植物的种子| 亚麻籽油是什么油| 增强免疫力吃什么维生素| 单反是什么意思| 胎盘能治什么病| 胎盘成熟度2级是什么意思| 监测是什么意思| 李子和什么不能一起吃| 囫囵吞枣是什么意思| 吃黄瓜有什么好处| 什么叫胰岛素抵抗| 孩子流黄鼻涕是什么原因| 什么叫裸眼视力| 上火吃什么药| 血虚是什么原因造成的| 专科医院是什么意思| 为什么床上有蚂蚁| 什么是肾上腺素| 九分裤配什么鞋| 肝胆挂什么科| 什么是特需门诊| 宫颈肥大有什么症状| 女人身体发热预示什么| 慢性咽炎吃什么药好得快能根治| 喜大普奔什么意思| 检查胸部应该挂什么科| 尊敬是什么意思| 什么是体外射精| 澳大利亚属于什么国家| 冰箱什么品牌最好| 灵犀是什么意思| 胎儿停止发育是什么原因造成的| 小蜜蜂是什么牌子| 三价铁离子什么颜色| 情投意合是什么意思| 免去职务是什么意思| 化学专业学什么| 1月15日什么星座| 血红蛋白低吃什么| 什么什么各异| 憋屈是什么意思| 梦见别人流血是什么预兆| 大脚趾头疼是什么原因| 左氧氟沙星的功效是什么| 送护士女朋友什么礼物| 吃什么去肝火见效快| 农历八月是什么月| 为什么拉尿会刺痛| scj是什么意思| 囊壁钙化是什么意思| 麝香是什么| 每天做梦是什么原因引起| 表情是什么意思| 宝宝拉肚子挂什么科| 逼格什么意思| 花菜炒什么好吃| 棕色搭配什么颜色好看| 霏字五行属什么| 口蘑不能和什么一起吃| 孕晚期流鼻血是什么原因| 心脏上有个小洞是什么病| 什么是暗物质| 下面瘙痒是什么原因| 高铁特等座有什么待遇| 手掌发麻是什么原因| AMI是什么病| 北京有什么特产| 什么医院才是正规医院| 皮毒清软膏有什么功效| 子母门是什么意思| 什么是植物人| 吃什么不便秘| 冰种翡翠属于什么档次| 宫高是什么意思| 喜丧是什么意思| 婴儿眼屎多是什么原因| 女性胆固醇高吃什么好| 各自安好是什么意思| 芈怎么读什么意思| 容易淤青是什么原因| 陈醋和白醋有什么区别| 梦到坟墓是什么意思| 蝼蛄是什么| 脑溢血是什么原因引起的| 牛油果和什么榨汁好喝| 杏和什么不能一起吃| 卵巢囊肿吃什么食物好| 乳腺钙化是什么意思啊| 孤独症有什么表现| 鹅厂是什么意思| 6月22是什么星座| 梦见网鱼是什么征兆| 联手是什么意思| 橘红是什么| 为什么想到一个人会心痛| 为什么会鬼压床| 漏尿是什么原因| 军绿色是什么颜色| 宫内膜回声不均匀是什么意思| 汗疱疹用什么药膏最好| 脑补是什么意思| 干白是什么酒| 双肺条索是什么意思| 单独粘米粉能做什么| 神经系统是由什么组成的| 乙肝表面抗原高是什么意思| 维生素b6吃多了有什么副作用| 手抽筋是什么原因| 喝蒲公英有什么好处| 四岁属什么生肖| 反胃恶心吃什么药| 生气过度会气出什么病| 范畴的意思是什么| 红薯开花预示着什么| 87年五行属什么| 7月26日什么星座| 球蛋白有什么作用和功效| 肝阴虚吃什么中成药| 10月24号什么星座| 骨质破坏是什么意思| 脑梗有什么症状前兆| 金银花为什么叫忍冬| bpm是什么意思| ci是什么意思| 假小子是什么意思| 备孕前需要做什么检查| 什么国家的钱最值钱| 双子座和什么座最不配| 血压低有什么症状| 直捣黄龙是什么意思| 双侧骶髂关节致密性骨炎是什么病| 浙江有什么城市| 十二生肖排第一是什么生肖| 今年男宝宝取什么名字好| 管状腺瘤是什么病| t波改变是什么意思| moi是什么意思| 多巴胺分泌是什么意思| 血糖高的人吃什么水果好| 百度

速8惊现千辆“僵尸车”,人类是无人驾驶的天敌?

百度 E2004MATIC比E200贵了万元,差别仅限于一套全时四驱系统,如果是北方用户,为了恶劣路况下有更好的行驶性能,可以考虑选择四驱。

The least squares method is a statistical technique used in regression analysis to find the best trend line for a data set on a graph. It essentially finds the best-fit line that represents the overall direction of the data. Each data point represents the relation between an independent variable.

The result of fitting a set of data points with a quadratic function
Conic fitting a set of points using least-squares approximation

History

edit

The method was the culmination of several advances that took place during the course of the eighteenth century:[1]

  • The combination of different observations as being the best estimate of the true value; errors decrease with aggregation rather than increase, first appeared in Isaac Newton's work in 1671, though it went unpublished, and again in 1700.[2][3] It was perhaps first expressed formally by Roger Cotes in 1722.
  • The combination of different observations taken under the same conditions contrary to simply trying one's best to observe and record a single observation accurately. The approach was known as the method of averages. This approach was notably used by Newton while studying equinoxes in 1700, also writing down the first of the 'normal equations' known from ordinary least squares,[4] Tobias Mayer while studying the librations of the Moon in 1750, and by Pierre-Simon Laplace in his work in explaining the differences in motion of Jupiter and Saturn in 1788.
  • The combination of different observations taken under different conditions. The method came to be known as the method of least absolute deviation. It was notably performed by Roger Joseph Boscovich in his work on the shape of the Earth in 1757 and by Pierre-Simon Laplace for the same problem in 1789 and 1799.
  • The development of a criterion that can be evaluated to determine when the solution with the minimum error has been achieved. Laplace tried to specify a mathematical form of the probability density for the errors and define a method of estimation that minimizes the error of estimation. For this purpose, Laplace used a symmetric two-sided exponential distribution we now call Laplace distribution to model the error distribution, and used the sum of absolute deviation as error of estimation. He felt these to be the simplest assumptions he could make, and he had hoped to obtain the arithmetic mean as the best estimate. Instead, his estimator was the posterior median.

The method

edit
?
Carl Friedrich Gauss

The first clear and concise exposition of the method of least squares was published by Legendre in 1805.[5] The technique is described as an algebraic procedure for fitting linear equations to data and Legendre demonstrates the new method by analyzing the same data as Laplace for the shape of the Earth. Within ten years after Legendre's publication, the method of least squares had been adopted as a standard tool in astronomy and geodesy in France, Italy, and Prussia, which constitutes an extraordinarily rapid acceptance of a scientific technique.[1]

In 1809 Carl Friedrich Gauss published his method of calculating the orbits of celestial bodies. In that work he claimed to have been in possession of the method of least squares since 1795.[6] This naturally led to a priority dispute with Legendre. However, to Gauss's credit, he went beyond Legendre and succeeded in connecting the method of least squares with the principles of probability and to the normal distribution. He had managed to complete Laplace's program of specifying a mathematical form of the probability density for the observations, depending on a finite number of unknown parameters, and define a method of estimation that minimizes the error of estimation. Gauss showed that the arithmetic mean is indeed the best estimate of the location parameter by changing both the probability density and the method of estimation. He then turned the problem around by asking what form the density should have and what method of estimation should be used to get the arithmetic mean as estimate of the location parameter. In this attempt, he invented the normal distribution.

An early demonstration of the strength of Gauss's method came when it was used to predict the future location of the newly discovered asteroid Ceres. On 1 January 1801, the Italian astronomer Giuseppe Piazzi discovered Ceres and was able to track its path for 40 days before it was lost in the glare of the Sun. Based on these data, astronomers desired to determine the location of Ceres after it emerged from behind the Sun without solving Kepler's complicated nonlinear equations of planetary motion. The only predictions that successfully allowed Hungarian astronomer Franz Xaver von Zach to relocate Ceres were those performed by the 24-year-old Gauss using least-squares analysis.

In 1810, after reading Gauss's work, Laplace, after proving the central limit theorem, used it to give a large sample justification for the method of least squares and the normal distribution. In 1822, Gauss was able to state that the least-squares approach to regression analysis is optimal in the sense that in a linear model where the errors have a mean of zero, are uncorrelated, normally distributed, and have equal variances, the best linear unbiased estimator of the coefficients is the least-squares estimator. An extended version of this result is known as the Gauss–Markov theorem.

The idea of least-squares analysis was also independently formulated by the American Robert Adrain in 1808. In the next two centuries workers in the theory of errors and in statistics found many different ways of implementing least squares.[7]

Problem statement

edit

The objective consists of adjusting the parameters of a model function to best fit a data set. A simple data set consists of n points (data pairs) ?, i = 1, …, n, where ? is an independent variable and ? is a dependent variable whose value is found by observation. The model function has the form ?, where m adjustable parameters are held in the vector ?. The goal is to find the parameter values for the model that "best" fits the data. The fit of a model to a data point is measured by its residual, defined as the difference between the observed value of the dependent variable and the value predicted by the model: ?

?
The residuals are plotted against corresponding ? values. The random fluctuations about ? indicate a linear model is appropriate.

The least-squares method finds the optimal parameter values by minimizing the sum of squared residuals, ?:[8] ?

In the simplest case ? and the result of the least-squares method is the arithmetic mean of the input data.

An example of a model in two dimensions is that of the straight line. Denoting the y-intercept as ? and the slope as ?, the model function is given by ?. See linear least squares for a fully worked out example of this model.

A data point may consist of more than one independent variable. For example, when fitting a plane to a set of height measurements, the plane is a function of two independent variables, x and z, say. In the most general case there may be one or more independent variables and one or more dependent variables at each data point.

To the right is a residual plot illustrating random fluctuations about ?, indicating that a linear model? is appropriate. ? is an independent, random variable.[8] ?

?
The residuals are plotted against the corresponding ? values. The parabolic shape of the fluctuations about ? indicates a parabolic model is appropriate.

If the residual points had some sort of a shape and were not randomly fluctuating, a linear model would not be appropriate. For example, if the residual plot had a parabolic shape as seen to the right, a parabolic model ? would be appropriate for the data. The residuals for a parabolic model can be calculated via ?.[8]

Advantages

edit

One of the main benefits of using this method is that it is easy to apply and understand. That's because it only uses two variables (one that is shown along the x-axis and the other on the y-axis) while highlighting the best relationship between them.

Investors and analysts can use the least square method by analyzing past performance and making predictions about future trends in the economy and stock markets. As such, it can be used as a decision-making tool

Limitations

edit

This regression formulation considers only observational errors in the dependent variable (but the alternative total least squares regression can account for errors in both variables). There are two rather different contexts with different implications:

  • Regression for prediction. Here a model is fitted to provide a prediction rule for application in a similar situation to which the data used for fitting apply. Here the dependent variables corresponding to such future application would be subject to the same types of observation error as those in the data used for fitting. It is therefore logically consistent to use the least-squares prediction rule for such data.
  • Regression for fitting a "true relationship". In standard regression analysis that leads to fitting by least squares there is an implicit assumption that errors in the independent variable are zero or strictly controlled so as to be negligible. When errors in the independent variable are non-negligible, models of measurement error can be used; such methods can lead to parameter estimates, hypothesis testing and confidence intervals that take into account the presence of observation errors in the independent variables.[9] An alternative approach is to fit a model by total least squares; this can be viewed as taking a pragmatic approach to balancing the effects of the different sources of error in formulating an objective function for use in model-fitting.

Solving the least squares problem

edit

The minimum of the sum of squares is found by setting the gradient to zero. Since the model contains m parameters, there are m gradient equations: ? and since ?, the gradient equations become ?

The gradient equations apply to all least squares problems. Each particular problem requires particular expressions for the model and its partial derivatives.[10]

Linear least squares

edit

A regression model is a linear one when the model comprises a linear combination of the parameters, i.e., ? where the function ? is a function of ?.[10]

Letting ? and putting the independent and dependent variables in matrices ? and ? respectively, we can compute the least squares in the following way. Note that ? is the set of all data.[10][11] ? ?

The gradient of the loss is: ?

Setting the gradient of the loss to zero and solving for ?, we get:[11][10] ? ?

Non-linear least squares

edit

There is, in some cases, a closed-form solution to a non-linear least squares problem – but in general there is not. In the case of no closed-form solution, numerical algorithms are used to find the value of the parameters ? that minimizes the objective. Most algorithms involve choosing initial values for the parameters. Then, the parameters are refined iteratively, that is, the values are obtained by successive approximation: ? where a superscript k is an iteration number, and the vector of increments ? is called the shift vector. In some commonly used algorithms, at each iteration the model may be linearized by approximation to a first-order Taylor series expansion about ?: ?

The Jacobian J is a function of constants, the independent variable and the parameters, so it changes from one iteration to the next. The residuals are given by ?

To minimize the sum of squares of ?, the gradient equation is set to zero and solved for ?: ? which, on rearrangement, become m simultaneous linear equations, the normal equations: ?

The normal equations are written in matrix notation as ?

These are the defining equations of the Gauss–Newton algorithm.

Differences between linear and nonlinear least squares

edit
  • The model function, f, in LLSQ (linear least squares) is a linear combination of parameters of the form ? The model may represent a straight line, a parabola or any other linear combination of functions. In NLLSQ (nonlinear least squares) the parameters appear as functions, such as ? and so forth. If the derivatives ? are either constant or depend only on the values of the independent variable, the model is linear in the parameters. Otherwise, the model is nonlinear.
  • Need initial values for the parameters to find the solution to a NLLSQ problem; LLSQ does not require them.
  • Solution algorithms for NLLSQ often require that the Jacobian can be calculated similar to LLSQ. Analytical expressions for the partial derivatives can be complicated. If analytical expressions are impossible to obtain either the partial derivatives must be calculated by numerical approximation or an estimate must be made of the Jacobian, often via finite differences.
  • Non-convergence (failure of the algorithm to find a minimum) is a common phenomenon in NLLSQ.
  • LLSQ is globally concave so non-convergence is not an issue.
  • Solving NLLSQ is usually an iterative process which has to be terminated when a convergence criterion is satisfied. LLSQ solutions can be computed using direct methods, although problems with large numbers of parameters are typically solved with iterative methods, such as the Gauss–Seidel method.
  • In LLSQ the solution is unique, but in NLLSQ there may be multiple minima in the sum of squares.
  • Under the condition that the errors are uncorrelated with the predictor variables, LLSQ yields unbiased estimates, but even under that condition NLLSQ estimates are generally biased.

These differences must be considered whenever the solution to a nonlinear least squares problem is being sought.[10]

Example

edit

Consider a simple example drawn from physics. A spring should obey Hooke's law which states that the extension of a spring y is proportional to the force, F, applied to it. ? constitutes the model, where F is the independent variable. In order to estimate the force constant, k, we conduct a series of n measurements with different forces to produce a set of data, ?, where yi is a measured spring extension.[12] Each experimental observation will contain some error, ?, and so we may specify an empirical model for our observations, ?

There are many methods we might use to estimate the unknown parameter k. Since the n equations in the m variables in our data comprise an overdetermined system with one unknown and n equations, we estimate k using least squares. The sum of squares to be minimized is[10] ?

The least squares estimate of the force constant, k, is given by ?

We assume that applying force causes the spring to expand. After having derived the force constant by least squares fitting, we predict the extension from Hooke's law.

Uncertainty quantification

edit

In a least squares calculation with unit weights, or in linear regression, the variance on the jth parameter, denoted ?, is usually estimated with ? ? ? where the true error variance σ2 is replaced by an estimate, the reduced chi-squared statistic, based on the minimized value of the residual sum of squares (objective function), S. The denominator, n???m, is the statistical degrees of freedom; see effective degrees of freedom for generalizations.[10] C is the covariance matrix.

Statistical testing

edit

If the probability distribution of the parameters is known or an asymptotic approximation is made, confidence limits can be found. Similarly, statistical tests on the residuals can be conducted if the probability distribution of the residuals is known or assumed. We can derive the probability distribution of any linear combination of the dependent variables if the probability distribution of experimental errors is known or assumed. Inferring is easy when assuming that the errors follow a normal distribution, consequently implying that the parameter estimates and residuals will also be normally distributed conditional on the values of the independent variables.[10]

It is necessary to make assumptions about the nature of the experimental errors to test the results statistically. A common assumption is that the errors belong to a normal distribution. The central limit theorem supports the idea that this is a good approximation in many cases.

  • The Gauss–Markov theorem. In a linear model in which the errors have expectation zero conditional on the independent variables, are uncorrelated and have equal variances, the best linear unbiased estimator of any linear combination of the observations, is its least-squares estimator. "Best" means that the least squares estimators of the parameters have minimum variance. The assumption of equal variance is valid when the errors all belong to the same distribution.[13]
  • If the errors belong to a normal distribution, the least-squares estimators are also the maximum likelihood estimators in a linear model.

However, suppose the errors are not normally distributed. In that case, a central limit theorem often nonetheless implies that the parameter estimates will be approximately normally distributed so long as the sample is reasonably large. For this reason, given the important property that the error mean is independent of the independent variables, the distribution of the error term is not an important issue in regression analysis. Specifically, it is not typically important whether the error term follows a normal distribution.

Weighted least squares

edit
?
"Fanning Out" Effect of Heteroscedasticity

A special case of generalized least squares called weighted least squares occurs when all the off-diagonal entries of Ω (the correlation matrix of the residuals) are null; the variances of the observations (along the covariance matrix diagonal) may still be unequal (heteroscedasticity). In simpler terms, heteroscedasticity is when the variance of ? depends on the value of ? which causes the residual plot to create a "fanning out" effect towards larger ? values as seen in the residual plot to the right. On the other hand, homoscedasticity is assuming that the variance of ? and variance of ? are equal.[8] ?

Relationship to principal components

edit

The first principal component about the mean of a set of points can be represented by that line which most closely approaches the data points (as measured by squared distance of closest approach, i.e. perpendicular to the line). In contrast, linear least squares tries to minimize the distance in the ? direction only. Thus, although the two use a similar error metric, linear least squares is a method that treats one dimension of the data preferentially, while PCA treats all dimensions equally.

Relationship to measure theory

edit

Notable statistician Sara van de Geer used empirical process theory and the Vapnik–Chervonenkis dimension to prove a least-squares estimator can be interpreted as a measure on the space of square-integrable functions.[14]

Regularization

edit

Tikhonov regularization

edit

In some contexts, a regularized version of the least squares solution may be preferable. Tikhonov regularization (or ridge regression) adds a constraint that ?, the squared ?-norm of the parameter vector, is not greater than a given value to the least squares formulation, leading to a constrained minimization problem. This is equivalent to the unconstrained minimization problem where the objective function is the residual sum of squares plus a penalty term ? and ? is a tuning parameter (this is the Lagrangian form of the constrained minimization problem).[15]

In a Bayesian context, this is equivalent to placing a zero-mean normally distributed prior on the parameter vector.

Lasso method

edit

An alternative regularized version of least squares is Lasso (least absolute shrinkage and selection operator), which uses the constraint that ?, the L1-norm of the parameter vector, is no greater than a given value.[16][17][18] (One can show like above using Lagrange multipliers that this is equivalent to an unconstrained minimization of the least-squares penalty with ? added.) In a Bayesian context, this is equivalent to placing a zero-mean Laplace prior distribution on the parameter vector.[19] The optimization problem may be solved using quadratic programming or more general convex optimization methods, as well as by specific algorithms such as the least angle regression algorithm.

One of the prime differences between Lasso and ridge regression is that in ridge regression, as the penalty is increased, all parameters are reduced while still remaining non-zero, while in Lasso, increasing the penalty will cause more and more of the parameters to be driven to zero. This is an advantage of Lasso over ridge regression, as driving parameters to zero deselects the features from the regression. Thus, Lasso automatically selects more relevant features and discards the others, whereas Ridge regression never fully discards any features. Some feature selection techniques are developed based on the LASSO including Bolasso which bootstraps samples,[20] and FeaLect which analyzes the regression coefficients corresponding to different values of ? to score all the features.[21]

The L1-regularized formulation is useful in some contexts due to its tendency to prefer solutions where more parameters are zero, which gives solutions that depend on fewer variables.[16] For this reason, the Lasso and its variants are fundamental to the field of compressed sensing. An extension of this approach is elastic net regularization.

See also

edit

References

edit
  1. ^ a b Stigler, Stephen M. (1986). The History of Statistics: The Measurement of Uncertainty Before 1900. Cambridge, MA: Belknap Press of Harvard University Press. ISBN?978-0-674-40340-6.
  2. ^ Buchwald, Jed Z.; Feingold, Mordechai (2013). Newton and the Origin of Civilization. Princeton Oxford: Princeton University Press. pp.?90–93, 101–103. ISBN?978-0-691-15478-7.
  3. ^ Drum, Kevin (2025-08-14). "The Groundbreaking Isaac Newton Invention You've Never Heard Of". Mother Jones. Retrieved 2025-08-14.
  4. ^ Belenkiy, Ari; Echague, Eduardo Vila (2008). "Groping Toward Linear Regression Analysis: Newton's Analysis of Hipparchus' Equinox Observations". arXiv:0810.4948 [physics.hist-ph].
  5. ^ Legendre, Adrien-Marie (1805), Nouvelles méthodes pour la détermination des orbites des comètes [New Methods for the Determination of the Orbits of Comets] (in French), Paris: F. Didot, hdl:2027/nyp.33433069112559
  6. ^ "The Discovery of Statistical Regression". Priceonomics. 2025-08-14. Retrieved 2025-08-14.
  7. ^ Aldrich, J. (1998). "Doing Least Squares: Perspectives from Gauss and Yule". International Statistical Review. 66 (1): 61–81. doi:10.1111/j.1751-5823.1998.tb00406.x. S2CID?121471194.
  8. ^ a b c d A modern introduction to probability and statistics: understanding why and how. Dekking, Michel, 1946-. London: Springer. 2005. ISBN?978-1-85233-896-1. OCLC?262680588.{{cite book}}: CS1 maint: others (link)
  9. ^ For a good introduction to error-in-variables, please see Fuller, W. A. (1987). Measurement Error Models. John Wiley & Sons. ISBN?978-0-471-86187-4.
  10. ^ a b c d e f g h Williams, Jeffrey H. (Jeffrey Huw), 1956- (November 2016). Quantifying measurement: the tyranny of numbers. Morgan & Claypool Publishers, Institute of Physics (Great Britain). San Rafael [California] (40 Oak Drive, San Rafael, CA, 94903, USA). ISBN?978-1-68174-433-9. OCLC?962422324.{{cite book}}: CS1 maint: location (link) CS1 maint: location missing publisher (link) CS1 maint: multiple names: authors list (link) CS1 maint: numeric names: authors list (link)
  11. ^ a b Rencher, Alvin C.; Christensen, William F. (2025-08-14). Methods of Multivariate Analysis. John Wiley & Sons. p.?155. ISBN?978-1-118-39167-9.
  12. ^ Gere, James M.; Goodno, Barry J. (2013). Mechanics of Materials (8th?ed.). Stamford, Conn.: Cengage Learning. ISBN?978-1-111-57773-5. OCLC?741541348.
  13. ^ Hallin, Marc (2012). "Gauss-Markov Theorem". Encyclopedia of Environmetrics. Wiley. doi:10.1002/9780470057339.vnn102. ISBN?978-0-471-89997-6. Retrieved 18 October 2023.
  14. ^ van de Geer, Sara (June 1987). "A New Approach to Least-Squares Estimation, with Applications". Annals of Statistics. 15 (2): 587–602. doi:10.1214/aos/1176350362. S2CID?123088844.
  15. ^ van Wieringen, Wessel N. (2021). "Lecture notes on ridge regression". arXiv:1509.09169 [stat.ME].
  16. ^ a b Tibshirani, R. (1996). "Regression shrinkage and selection via the lasso". Journal of the Royal Statistical Society, Series B. 58 (1): 267–288. doi:10.1111/j.2517-6161.1996.tb02080.x. JSTOR?2346178.
  17. ^ Hastie, Trevor; Tibshirani, Robert; Friedman, Jerome H. (2009). The Elements of Statistical Learning (second?ed.). Springer-Verlag. ISBN?978-0-387-84858-7. Archived from the original on 2025-08-14.
  18. ^ Bühlmann, Peter; van de Geer, Sara (2011). Statistics for High-Dimensional Data: Methods, Theory and Applications. Springer. ISBN?9783642201929.
  19. ^ Park, Trevor; Casella, George (2008). "The Bayesian Lasso". Journal of the American Statistical Association. 103 (482): 681–686. doi:10.1198/016214508000000337. S2CID?11797924.
  20. ^ Bach, Francis R (2008). "Bolasso". Proceedings of the 25th international conference on Machine learning - ICML '08. pp.?33–40. arXiv:0804.1302. Bibcode:2008arXiv0804.1302B. doi:10.1145/1390156.1390161. ISBN?9781605582054. S2CID?609778.
  21. ^ Zare, Habil (2013). "Scoring relevancy of features based on combinatorial analysis of Lasso with application to lymphoma diagnosis". BMC Genomics. 14 (Suppl 1): S14. doi:10.1186/1471-2164-14-S1-S14. PMC?3549810. PMID?23369194.

Further reading

edit
edit
vod是什么意思 什么叫戈壁滩 下午茶一般吃什么 取环是什么意思 肌酐为什么会升高
肚子长痘痘是什么原因 hz是什么意思 葡萄糖酸钙锌口服溶液什么时候喝 黄芪可以和什么一起泡水喝 775是什么意思
热射病是什么原因引起的 食管反流吃什么药 戴银首饰对身体有什么好处 七寸是什么意思 北属于五行的什么
癔症是什么病 114是什么意思 正常白带是什么颜色 肺气肿有什么症状 家里什么东西止血最快
怀孕10天左右有什么症状hcv8jop2ns0r.cn 验尿细菌高是什么原因hcv9jop3ns5r.cn 财大气粗是什么意思hcv8jop1ns2r.cn 627是什么意思helloaicloud.com 湿热吃什么药好hcv8jop5ns0r.cn
背部毛孔粗大是什么原因hcv7jop5ns4r.cn 左侧腰疼是什么原因hcv9jop5ns1r.cn 肿瘤介入治疗是什么意思hcv8jop5ns6r.cn 多吃核桃有什么好处和坏处hcv8jop2ns1r.cn 圣罗兰为什么叫杨树林cj623037.com
黄喉是牛的什么部位hcv7jop9ns8r.cn 盐酸氯米帕明片有什么作用hcv8jop9ns6r.cn 左眼皮跳什么意思hcv7jop9ns8r.cn 男性囊肿是什么引起的hcv7jop9ns4r.cn 胰腺炎吃什么药见效快hcv7jop5ns6r.cn
生蚝有什么功效与作用hanqikai.com 舐犊是什么意思hcv8jop9ns7r.cn 羊肉水饺配什么菜好吃hcv7jop5ns4r.cn 格局是什么hcv8jop4ns4r.cn 三七粉什么颜色hcv8jop8ns8r.cn
百度