血糖高能喝什么粥| 人体第一道防线是什么| 什么的假山| 做梦梦到蟒蛇是什么征兆| nsaid是什么药| 西元前是什么意思| 邮箱是什么| 2月29号是什么星座| 小便尿色黄是什么问题| 高汤是什么意思| nt是什么意思| 想入非非什么意思| 赞什么不已| 五十路是什么意思| 日字旁跟什么有关| 米加参念什么| 农历8月是什么月| 人老是放屁是什么原因| 牡丹花代表什么生肖| 苏州有什么好玩的地方| 什么野菜降血糖| 吃东西恶心想吐是什么原因| 汗臭是什么原因| 蜱虫咬人后有什么症状| 秋葵什么时候种植最好| 肾活检是什么意思| 脾虚湿重吃什么中成药| 黄芪的读音是什么| 知恩图报是什么意思| 什么炖鸡汤好喝又营养| 人湿气重有什么症状| 梦见生姜是什么意思| 莱猪是什么| 梦见眼镜蛇是什么预兆| 4岁打什么疫苗| 幼儿贫血吃什么补血最快| 体检前三天不能吃什么| cns医学上是什么意思| 朱祁镇为什么杀于谦| 梦见蛇咬别人是什么意思| 大小便失禁是什么原因| 胸部疼痛是什么原因| pab是什么意思| 枣子什么时候成熟| 为什么不能下午看病人| 巨大的什么| 刘备和刘邦是什么关系| 烀是什么意思| 婴儿足底血筛查什么| 炒菜什么时候放调料| 舌苔发白是什么原因| poc是什么| 局部癌变是什么意思| 忘年恋是什么意思| 例假一个月来两次是什么原因| 梦见前夫是什么意思| 胎儿腹围偏大说明什么| 哼唧是什么意思| 肺部结节挂什么科| rhd血型阳性是什么意思| 血色素低是什么原因| 老虎头衣服是什么牌子| 痛风吃什么中药| 小儿急性喉炎吃什么药| 7.14日是什么日子| aid是什么意思| 猫的五行属什么| 上火引起的喉咙痛吃什么药| 城隍爷叫什么名字| 腹部ct能检查出什么| 抗病毒什么药效果好| 雪霁是什么意思| 夜里咳嗽是什么原因| 螃蟹为什么吐泡泡| 大刀阔斧是什么意思| 桥本甲状腺炎是什么| 什么茶降血糖| 无拘无束的意思是什么| 拿什么东西不用手| 摩羯座女生和什么星座男生最配| 什么是混合痔| 血糖偏高可以吃什么水果| 做梦孩子死了什么预兆| dbm是什么意思| 肺结核吃什么药| 干是什么意思| 麻疹是什么症状| 卡密什么意思| 荷花什么时候开放| 皮疹用什么药膏最好| 生化是什么原因引起的| lily是什么牌子| 世袭罔替什么意思| 三七和田七有什么区别| 振水音阳性提示什么| 北斗是什么意思| 安痛定又叫什么| 面瘫是什么引起的| sga是什么意思| 我是什么结构| 肚子有虫吃什么药| 口蜜腹剑是什么意思| 手肿是什么原因| 榜眼是什么意思| 三个王念什么| 五味杂陈什么意思| 老年人适合吃什么水果| 鸟语花香是什么生肖| 多汗症是什么原因引起的| 三月十五日是什么星座| 宝宝积食发烧吃什么药| 兰桂齐芳是什么意思| 小分子水是什么水| 螺蛳粉为什么那么臭| 崖柏对人体有什么好处| 犀利的眼神是什么意思| 丝瓜水敷脸有什么作用| 杨利伟什么军衔| 哂是什么意思| 大腿粗是什么原因导致的| 荸荠又叫什么| 6.1号是什么星座| 腰酸背痛是什么原因| 椒盐是什么调料| 炸肺是什么意思| 风湿关节炎吃什么药| 呼吸困难气短是什么原因| apf值是什么意思| 眼睑痉挛是什么原因造成的| 什么是正装| 茴香是什么| 7.9什么星座| 羊肉饺子馅配什么蔬菜最好吃| 马拉松起源与什么有关| 喉咙发炎吃什么食物| 蓝色妖姬适合送什么人| 县常委什么级别| 人参有什么功效| 狗皮膏药是什么意思| 奥美拉唑是什么药| 受戒是什么意思| 乳腺结节吃什么好| 什么炎炎| 纵容是什么意思| 白介素8升高说明什么| 常州为什么叫龙城| 老年人爱出汗是什么原因| 顺产收腹带什么时候用最佳| 车水马龙什么意思| 一案双查是什么意思| 秋是什么生肖| rf是什么的缩写| 12月13号是什么星座| 痰的颜色代表什么| 胆结石吃什么药可以化掉结石| 原木色是什么颜色| 机体是什么意思| 血小板是什么颜色的| 胆红素高是怎么回事有什么危害| 多此一举是什么意思| 今年62岁属什么生肖| 马润什么意思| 岁次什么意思| 么么么是什么意思| 3月5号是什么星座| sakura是什么牌子| 检查甲亢挂什么科| 奢侈的近义词是什么| 人中上窄下宽代表什么| 海参是补什么的| 海字五行属什么| 喝劲酒有什么好处| 尼维达手表什么档次| 糖尿病人早餐吃什么| 人次什么意思| 什么是本科| 风风火火是什么生肖| 疏忽是什么意思| 几朵花代表什么意思| 夫妻肺片有什么材料| 一天中什么时候最冷| 四百分能上什么大学| 处级是什么级别| 什么辉煌四字词语| 命卦是什么意思| 一个提手一个京念什么| 元字五行属什么| 膳是什么意思| 认知障碍是什么病| 斯夫是什么意思| 锦鲤可以和什么鱼混养| 1999年发生了什么| 基因检测是什么意思| 栓塞是什么意思| 牵牛花又叫什么名字| 高危病变是什么意思| 窦性心律左室高电压什么意思| 牡丹花什么时候开花| 洗头膏什么牌子好| 快餐是什么意思| 银黑了用什么可以洗白| 什么农药最毒| 什么是九宫格| 龙骨是什么东西| 标准差是什么| 胃不舒服可以吃什么水果| 白内障有什么症状| 什么是公历年份| 量贩式ktv是什么意思| 湿热内蕴证有什么症状| 人定胜什么| 脚背上长痣代表什么| 8月24号是什么星座| gccg是什么牌子| 无与伦比是什么意思| py是什么意思| 马齿苋有什么作用| 心脏回流是什么意思| 肺癌积水意味什么结果| 国家主席是什么级别| zw是什么意思| 表面活性剂是什么| 什么呢| 黔鱼是什么鱼| 坐骨神经痛吃什么药好得快| 小女子这厢有礼了什么意思| 治类风湿用什么方法好| cook什么意思| 舌头溃疡吃什么药最好| 报考军校需要什么条件| 幼儿园中班学什么| 右手指发麻是什么原因| 同房出血是什么原因造成的| 金字旁乐读什么| 里字五行属什么| 生姜吃多了有什么害处| 什么叫飘窗| 血糖高的病人吃什么| 什么是无期徒刑| 什么样的包皮需要做手术| 盘古是一个什么样的人| 七月一号是什么节| 鼻窦炎吃什么药好| 化疗和靶向有什么区别| 中医学学什么| 薏米是什么米| spyder是什么品牌| 神甫是什么意思| 一金有什么用| 个个想出头是什么生肖| 胃病吃什么水果好| 众叛亲离是什么意思| 宝宝手脚冰凉是什么原因| 世界上最大的东西是什么| 蒸馒头用什么面粉| 湿毒吃什么药最有效| 肚脐周围疼痛是什么原因| 男生一般什么时候停止长高| 鲱鱼在中国叫什么鱼| 感光食物是什么意思| 生育险是什么| 处女座的幸运色是什么颜色| 韦编三绝是什么意思| 血小板低什么原因| 百度

【亿邦动力网】制造业供给指数首发 钢铁行业前

百度 该类型诈骗的主要手法如下:手法一:冒充贷款专员每笔贷款有佣金奖励首先,犯罪嫌疑人会自称各类网贷平台员工,以协助提高业绩并会支付报酬为名,诱导在校学生通过手机下载并安装注册登记为网贷平台用户。

Non-linear least squares is the form of least squares analysis used to fit a set of m observations with a model that is non-linear in n unknown parameters (m?≥?n). It is used in some forms of nonlinear regression. The basis of the method is to approximate the model by a linear one and to refine the parameters by successive iterations. There are many similarities to linear least squares, but also some significant differences. In economic theory, the non-linear least squares method is applied in (i) the probit regression, (ii) threshold regression, (iii) smooth regression, (iv) logistic link regression, (v) Box–Cox transformed regressors ().

Theory

edit

Consider a set of ? data points, ? and a curve (model function) ? that in addition to the variable ? also depends on ? parameters, ? with ? It is desired to find the vector ? of parameters such that the curve fits best the given data in the least squares sense, that is, the sum of squares ? is minimized, where the residuals (in-sample prediction errors) ri are given by ? for ?

The minimum value of S occurs when the gradient is zero. Since the model contains n parameters there are n gradient equations: ?

In a nonlinear system, the derivatives ? are functions of both the independent variable and the parameters, so in general these gradient equations do not have a closed solution. Instead, initial values must be chosen for the parameters. Then, the parameters are refined iteratively, that is, the values are obtained by successive approximation, ?

Here, k is an iteration number and the vector of increments, ? is known as the shift vector. At each iteration the model is linearized by approximation to a first-order Taylor polynomial expansion about ? ? The Jacobian matrix, J, is a function of constants, the independent variable and the parameters, so it changes from one iteration to the next. Thus, in terms of the linearized model, ? and the residuals are given by ? ?

Substituting these expressions into the gradient equations, they become ? which, on rearrangement, become n simultaneous linear equations, the normal equations ?

The normal equations are written in matrix notation as ?

These equations form the basis for the Gauss–Newton algorithm for a non-linear least squares problem.

Note the sign convention in the definition of the Jacobian matrix in terms of the derivatives. Formulas linear in ? may appear with factor of ? in other articles or the literature.

Extension by weights

edit

When the observations are not equally reliable, a weighted sum of squares may be minimized, ?

Each element of the diagonal weight matrix W should, ideally, be equal to the reciprocal of the error variance of the measurement.[1] The normal equations are then, more generally, ?

Geometrical interpretation

edit

In linear least squares the objective function, S, is a quadratic function of the parameters. ? When there is only one parameter the graph of S with respect to that parameter will be a parabola. With two or more parameters the contours of S with respect to any pair of parameters will be concentric ellipses (assuming that the normal equations matrix ? is positive definite). The minimum parameter values are to be found at the centre of the ellipses. The geometry of the general objective function can be described as paraboloid elliptical. In NLLSQ the objective function is quadratic with respect to the parameters only in a region close to its minimum value, where the truncated Taylor series is a good approximation to the model. ? The more the parameter values differ from their optimal values, the more the contours deviate from elliptical shape. A consequence of this is that initial parameter estimates should be as close as practicable to their (unknown!) optimal values. It also explains how divergence can come about as the Gauss–Newton algorithm is convergent only when the objective function is approximately quadratic in the parameters.

Computation

edit

Initial parameter estimates

edit

Some problems of ill-conditioning and divergence can be corrected by finding initial parameter estimates that are near to the optimal values. A good way to do this is by computer simulation. Both the observed and calculated data are displayed on a screen. The parameters of the model are adjusted by hand until the agreement between observed and calculated data is reasonably good. Although this will be a subjective judgment, it is sufficient to find a good starting point for the non-linear refinement. Initial parameter estimates can be created using transformations or linearizations. Better still evolutionary algorithms such as the Stochastic Funnel Algorithm can lead to the convex basin of attraction that surrounds the optimal parameter estimates.[citation needed] Hybrid algorithms that use randomization and elitism, followed by Newton methods have been shown to be useful and computationally efficient[citation needed].

Solution

edit

Any method among the ones described below can be applied to find a solution.

Convergence criteria

edit

The common sense criterion for convergence is that the sum of squares does not increase from one iteration to the next. However this criterion is often difficult to implement in practice, for various reasons. A useful convergence criterion is ? The value 0.0001 is somewhat arbitrary and may need to be changed. In particular it may need to be increased when experimental errors are large. An alternative criterion is ?

Again, the numerical value is somewhat arbitrary; 0.001 is equivalent to specifying that each parameter should be refined to 0.1% precision. This is reasonable when it is less than the largest relative standard deviation on the parameters.

Calculation of the Jacobian by numerical approximation

edit

There are models for which it is either very difficult or even impossible to derive analytical expressions for the elements of the Jacobian. Then, the numerical approximation ? is obtained by calculation of ? for ? and ?. The increment,?, size should be chosen so the numerical derivative is not subject to approximation error by being too large, or round-off error by being too small.

Parameter errors, confidence limits, residuals etc.

edit

Some information is given in the corresponding section on the Weighted least squares page.

Multiple minima

edit

Multiple minima can occur in a variety of circumstances some of which are:

  • A parameter is raised to a power of two or more. For example, when fitting data to a Lorentzian curve ? where ? is the height, ? is the position and ? is the half-width at half height, there are two solutions for the half-width, ? and ? which give the same optimal value for the objective function.
  • Two parameters can be interchanged without changing the value of the model. A simple example is when the model contains the product of two parameters, since ? will give the same value as ?.
  • A parameter is in a trigonometric function, such as ?, which has identical values at ?. See Levenberg–Marquardt algorithm for an example.

Not all multiple minima have equal values of the objective function. False minima, also known as local minima, occur when the objective function value is greater than its value at the so-called global minimum. To be certain that the minimum found is the global minimum, the refinement should be started with widely differing initial values of the parameters. When the same minimum is found regardless of starting point, it is likely to be the global minimum.

When multiple minima exist there is an important consequence: the objective function will have a stationary point (e.g. a maximum or a saddle point) somewhere between two minima. The normal equations matrix is not positive definite at a stationary point in the objective function, because the gradient vanishes and no unique direction of descent exists. Refinement from a point (a set of parameter values) close to a stationary point will be ill-conditioned and should be avoided as a starting point. For example, when fitting a Lorentzian the normal equations matrix is not positive definite when the half-width of the Lorentzian is zero.[2]

Transformation to a linear model

edit

A non-linear model can sometimes be transformed into a linear one. Such an approximation is, for instance, often applicable in the vicinity of the best estimator, and it is one of the basic assumption in most iterative minimization algorithms. When a linear approximation is valid, the model can directly be used for inference with a generalized least squares, where the equations of the Linear Template Fit[3] apply.

Another example of a linear approximation would be when the model is a simple exponential function, ? which can be transformed into a linear model by taking logarithms. ? Graphically this corresponds to working on a semi-log plot. The sum of squares becomes ? This procedure should be avoided unless the errors are multiplicative and log-normally distributed because it can give misleading results. This comes from the fact that whatever the experimental errors on y might be, the errors on log y are different. Therefore, when the transformed sum of squares is minimized, different results will be obtained both for the parameter values and their calculated standard deviations. However, with multiplicative errors that are log-normally distributed, this procedure gives unbiased and consistent parameter estimates.

Another example is furnished by Michaelis–Menten kinetics, used to determine two parameters ? and ?: ? The Lineweaver–Burk plot ? of ? against ? is linear in the parameters ? and ? but very sensitive to data error and strongly biased toward fitting the data in a particular range of the independent variable ?.

Algorithms

edit

Gauss–Newton method

edit

The normal equations ? may be solved for ? by Cholesky decomposition, as described in linear least squares. The parameters are updated iteratively ? where k is an iteration number. While this method may be adequate for simple models, it will fail if divergence occurs. Therefore, protection against divergence is essential.

Shift-cutting

edit

If divergence occurs, a simple expedient is to reduce the length of the shift vector, ?, by a fraction, f ? For example, the length of the shift vector may be successively halved until the new value of the objective function is less than its value at the last iteration. The fraction, f could be optimized by a line search.[4] As each trial value of f requires the objective function to be re-calculated it is not worth optimizing its value too stringently.

When using shift-cutting, the direction of the shift vector remains unchanged. This limits the applicability of the method to situations where the direction of the shift vector is not very different from what it would be if the objective function were approximately quadratic in the parameters, ?

Marquardt parameter

edit

If divergence occurs and the direction of the shift vector is so far from its "ideal" direction that shift-cutting is not very effective, that is, the fraction, f required to avoid divergence is very small, the direction must be changed. This can be achieved by using the Marquardt parameter.[5] In this method the normal equations are modified ? where ? is the Marquardt parameter and I is an identity matrix. Increasing the value of ? has the effect of changing both the direction and the length of the shift vector. The shift vector is rotated towards the direction of steepest descent when ? ? is the steepest descent vector. So, when ? becomes very large, the shift vector becomes a small fraction of the steepest descent vector.

Various strategies have been proposed for the determination of the Marquardt parameter. As with shift-cutting, it is wasteful to optimize this parameter too stringently. Rather, once a value has been found that brings about a reduction in the value of the objective function, that value of the parameter is carried to the next iteration, reduced if possible, or increased if need be. When reducing the value of the Marquardt parameter, there is a cut-off value below which it is safe to set it to zero, that is, to continue with the unmodified Gauss–Newton method. The cut-off value may be set equal to the smallest singular value of the Jacobian.[6] A bound for this value is given by ? where tr is the trace function.[7]

QR decomposition

edit

The minimum in the sum of squares can be found by a method that does not involve forming the normal equations. The residuals with the linearized model can be written as ? The Jacobian is subjected to an orthogonal decomposition; the QR decomposition will serve to illustrate the process. ? where Q is an orthogonal ? matrix and R is an ? matrix which is partitioned into an ? block, ?, and a ? zero block. ? is upper triangular.

?

The residual vector is left-multiplied by ?.

?

This has no effect on the sum of squares since ? because Q is orthogonal. The minimum value of S is attained when the upper block is zero. Therefore, the shift vector is found by solving ?

These equations are easily solved as R is upper triangular.

Singular value decomposition

edit

A variant of the method of orthogonal decomposition involves singular value decomposition, in which R is diagonalized by further orthogonal transformations.

? where ? is orthogonal, ? is a diagonal matrix of singular values and ? is the orthogonal matrix of the eigenvectors of ? or equivalently the right singular vectors of ?. In this case the shift vector is given by ?

The relative simplicity of this expression is very useful in theoretical analysis of non-linear least squares. The application of singular value decomposition is discussed in detail in Lawson and Hanson.[6]

Gradient methods

edit

There are many examples in the scientific literature where different methods have been used for non-linear data-fitting problems.

  • Inclusion of second derivatives in The Taylor series expansion of the model function. This is Newton's method in optimization. ? The matrix H is known as the Hessian matrix. Although this model has better convergence properties near to the minimum, it is much worse when the parameters are far from their optimal values. Calculation of the Hessian adds to the complexity of the algorithm. This method is not in general use.
  • Davidon–Fletcher–Powell method. This method, a form of pseudo-Newton method, is similar to the one above but calculates the Hessian by successive approximation, to avoid having to use analytical expressions for the second derivatives.
  • Steepest descent. Although a reduction in the sum of squares is guaranteed when the shift vector points in the direction of steepest descent, this method often performs poorly. When the parameter values are far from optimal the direction of the steepest descent vector, which is normal (perpendicular) to the contours of the objective function, is very different from the direction of the Gauss–Newton vector. This makes divergence much more likely, especially as the minimum along the direction of steepest descent may correspond to a small fraction of the length of the steepest descent vector. When the contours of the objective function are very eccentric, due to there being high correlation between parameters, the steepest descent iterations, with shift-cutting, follow a slow, zig-zag trajectory towards the minimum.
  • Conjugate gradient search. This is an improved steepest descent based method with good theoretical convergence properties, although it can fail on finite-precision digital computers even when used on quadratic problems.[8]

Direct search methods

edit

Direct search methods depend on evaluations of the objective function at a variety of parameter values and do not use derivatives at all. They offer alternatives to the use of numerical derivatives in the Gauss–Newton method and gradient methods.

  • Alternating variable search.[4] Each parameter is varied in turn by adding a fixed or variable increment to it and retaining the value that brings about a reduction in the sum of squares. The method is simple and effective when the parameters are not highly correlated. It has very poor convergence properties, but may be useful for finding initial parameter estimates.
  • Nelder–Mead (simplex) search. A simplex in this context is a polytope of n?+?1 vertices in n dimensions; a triangle on a plane, a tetrahedron in three-dimensional space and so forth. Each vertex corresponds to a value of the objective function for a particular set of parameters. The shape and size of the simplex is adjusted by varying the parameters in such a way that the value of the objective function at the highest vertex always decreases. Although the sum of squares may initially decrease rapidly, it can converge to a nonstationary point on quasiconvex problems, by an example of M. J. D. Powell.

More detailed descriptions of these, and other, methods are available, in Numerical Recipes, together with computer code in various languages.

See also

edit

References

edit
  1. ^ This implies that the observations are uncorrelated. If the observations are correlated, the expression ? applies. In this case the weight matrix should ideally be equal to the inverse of the error variance-covariance matrix of the observations.
  2. ^ In the absence of round-off error and of experimental error in the independent variable the normal equations matrix would be singular
  3. ^ Britzger, Daniel (2022). "The Linear Template Fit". Eur. Phys. J. C. 82 (8): 731. arXiv:2112.01548. Bibcode:2022EPJC...82..731B. doi:10.1140/epjc/s10052-022-10581-w.
  4. ^ a b M.J. Box, D. Davies and W.H. Swann, Non-Linear optimisation Techniques, Oliver & Boyd, 1969
  5. ^ This technique was proposed independently by Levenberg (1944), Girard (1958), Wynne (1959), Morrison (1960) and Marquardt (1963). Marquardt's name alone is used for it in much of the scientific literature. See the main article for citation references.
  6. ^ a b C.L. Lawson and R.J. Hanson, Solving Least Squares Problems, Prentice–Hall, 1974
  7. ^ R. Fletcher, UKAEA Report AERE-R 6799, H.M. Stationery Office, 1971
  8. ^ M. J. D. Powell, Computer Journal, (1964), 7, 155.

Further reading

edit
胸痛是什么原因导致的 什么是非遗 ddp是什么化疗药 肝右叶低密度灶是什么意思 忐忑不安是什么意思
胃窦糜烂是什么意思严重吗 甲状腺吃什么食物好 痛风性关节炎吃什么药 维生素b吃什么 牙齿发炎吃什么消炎药
黑枸杞的功效是什么 司马懿字什么 吊人什么意思 农历闰六月有什么讲究 梅毒通过什么途径传染
五黄煞是什么意思 11月15日出生是什么星座 阿玛尼是什么意思 人民是什么 胆囊息肉不能吃什么
hct是什么意思hcv8jop9ns0r.cn 什么的云海hcv7jop7ns3r.cn 毛蛋是什么hcv8jop5ns7r.cn 养肝吃什么药hcv9jop2ns6r.cn 外阴红肿疼痛用什么药hcv9jop0ns7r.cn
智齿吃什么消炎药mmeoe.com 筋头巴脑是什么肉hcv8jop8ns3r.cn 生二胎应该注意什么hcv9jop4ns5r.cn 牡丹花什么时候开花hcv8jop2ns2r.cn 书生是什么意思bjhyzcsm.com
什么叫五官hcv7jop6ns6r.cn 服中药期间忌吃什么hcv9jop4ns2r.cn 什么叫佛系hcv9jop5ns8r.cn 漱口杯什么材质好hcv7jop9ns0r.cn 什么是什么hcv8jop5ns4r.cn
属猪的跟什么属相最配hcv8jop8ns3r.cn 女性更年期潮热出汗吃什么药hcv9jop0ns4r.cn 白内障什么症状hcv8jop9ns0r.cn 南方的粽子一般是什么口味hcv8jop3ns6r.cn 纲是什么意思cl108k.com
百度