脸上长白斑是什么原因| 48年属什么| 喉咙痛吃什么药好得快| 未可以加什么偏旁| 橘色五行属什么| 什么样的人容易孕酮低| 水钻是什么材质| 族谱是什么意思| 语重心长是什么意思| 拆骨肉是什么肉| 胸内科主要看什么病| 藕是莲的什么部位| 抗病毒什么药效果好| 皮肤长斑是什么原因引起的| 咳嗽吃什么药好| 1977年五行属什么| 长期湿热会引起什么病| 一个尔一个玉念什么| 夏枯草是什么样子| 六味地黄丸有什么功效与作用| 免签国家是什么意思| 派出所什么时候上班| 梦见自己的手机丢了是什么意思| 2024年五行属什么| 筷子买什么材质的好| 高血压和高血脂有什么区别| 想吃辣是身体缺乏什么| 判官是什么意思| 怀孕血糖高有什么症状| 疴是什么意思| 巨门是什么意思| 眼睛为什么会长麦粒肿| 眼睛干涩用什么眼药水| 吃什么排铅效果最好| 牛肉补什么| 后背长痘痘是什么原因引起的| 年薪20万算什么水平| 吃茶油对身体有什么好处| 自豪的什么| 坛城是什么意思| 鸽子和什么炖气血双补| 窦性心律过速吃什么药| 椰果是什么做的| 心包填塞三联征是什么| 梦中的梦中是什么歌| 天空为什么是蓝色| 茶禅一味是什么意思| 宝格丽手表什么档次| 脑梗塞用什么药效果好| 咳嗽吃什么水果最好| kimi什么意思| 鲤鱼爱吃什么食物| 偏光眼镜是什么意思| 蛋白质高是什么原因| 甲沟炎属于什么科| 疣是什么东西| 山东有什么特产| 扬州有什么好吃的| 修复子宫内膜吃什么药| 斑斓什么意思| 什么补气血| 太虚幻境是什么意思| mw是什么单位| 罗字五行属什么| 印模是什么意思| 造纸术什么时候发明的| 脍炙人口什么意思| 什么蛋白质含量高| 凌五行属性是什么| 什么食物是养肝的| 羊眼圈是什么| 孕妇梦见大蟒蛇是什么意思| 什么是有机物什么是无机物| 红细胞低吃什么补得快| 山东立冬吃什么| 子宫多发肌瘤是什么意思| 隐血是什么意思| 头发软化和拉直有什么区别| 脾围是什么意思| 妇科做活检是什么意思| 三岁看小七岁看老是什么意思| 碎花裙配什么鞋子| 什么是感光食物| 发烧吃什么| 梦到火是什么意思| 渐入佳境是什么意思| wy是什么牌子| 黄豆酱做什么菜好吃| 缺钾吃什么食物| 秦始皇为什么叫祖龙| 白龙马是什么生肖| 痛风挂什么科| 老是嗜睡是什么原因| 嗫嚅是什么意思| 胸椎退行性变什么意思| 8五行属什么| sc1是什么意思| 心率变异性是什么意思| 男人尿频是什么原因| 强直性脊柱炎看什么科| 持续低烧不退是什么原因| 四条杠是什么牌子衣服| 不一般是什么意思| 膈应是什么意思| 阴道镜是检查什么的| 美妙绝伦是什么意思| 待业是什么意思| 右肺中叶索条什么意思| 疝囊是什么| 臻字的意思是什么| 1月30日什么星座| 右胸上方隐痛什么原因| 流量mb是什么意思| 什么是抑郁症| 一个日一个斤念什么| 恩五行属什么| 琉璃色是什么颜色| 蝉联是什么意思| 1658是什么意思| 什么的香蕉| 农历11月18日是什么星座| 其实不然是什么意思| 腋下异味挂什么科| 一饿就心慌是什么原因引起的| 真露酒属于什么酒| 被蜜蜂蛰了涂什么药膏| 脸上不出汗是什么原因| 静脉血栓是什么症状| 独生子女证有什么用| 1949属什么生肖| 肚子疼腹泻吃什么药| 胃烧心吃什么能缓解| 月全食是什么意思| 1949属什么生肖| 全身酸痛是什么原因| apk是什么格式| 为什么不建议小孩打流感疫苗| 吃人参果有什么好处| 丹参长什么样子图片| 敞开心扉是什么意思| 刷屏是什么意思| 45岁属什么| 菠萝蜜不能跟什么一起吃| 血脂异常是什么意思| 恪尽职守是什么意思| 芝麻开花节节高是什么意思| 佛度有缘人是什么意思| 身体缺糖有什么症状| 无极调光是什么意思| 五官端正是什么意思| 头发需要什么营养| 胎儿永存左上腔静脉是什么意思| 立字五行属什么| 柔顺剂有什么用| 什么叫造口| 哺乳期吃避孕药对孩子有什么影响| 恒心是什么意思| 脚背肿是什么原因| 还有什么寓言故事| 六月二十八是什么日子| 唵嘛呢叭咪吽是什么意思| 诺贝尔奖是什么意思| 熊吃什么| 眼睫毛脱落是什么原因| 月经不调是什么原因造成的| 内脂是什么| 被蚂蚁咬了擦什么药| 宝宝拉水便是什么原因| scr是什么意思| 尿多尿频是什么原因| 梦到血是什么意思| 安阳车牌号是豫什么| 什么颜色显皮肤白| 兵解是什么意思| 体型最大的恐龙是什么| 冢字的意思是什么| 麻药叫什么名字| 脸颊红是什么原因| 初中属于什么专业| 手指脱皮是什么原因| 瘿瘤是什么病| rue是什么意思| 初潮是什么| 有容乃大是什么意思| 凤字五行属什么| 仙鹤代表什么生肖| 什么是iga肾病| 雪梨百合炖冰糖有什么功效| 胃炎伴糜烂吃什么药效果好| 以身相许是什么意思| 无所不用其极什么意思| 消化功能紊乱吃什么药| 扁桃体炎吃什么药最好效果好| 边界尚清是什么意思| 1948年属鼠的是什么命| 脂肪瘤是什么引起的| 玄色是什么颜色| 69年属鸡是什么命| 尿路感染看什么科| 梅花像什么| hdr是什么拍照功能| 男人气虚吃什么补得快| 谨言慎行下一句是什么| 港币长什么样| hivab是什么检测| 搪塞是什么意思| 肺部检查应该挂什么科| 中暑用什么药| 心电图异常q波什么意思| 什么花有毒| 怀孕有褐色分泌物是什么原因| 扁平苔藓有什么症状| 什么药治痒效果最好| 女人吃牛蛙有什么好处| 克霉唑为什么4天一次| 手指甲上有竖纹是什么原因| 早起嘴巴苦什么原因| 考c1驾照需要什么条件| qq会员有什么用| 血栓是什么意思| 视线模糊是什么原因| 高血压喝什么茶| 为什么胸口疼| 刻板印象是什么意思| premier是什么牌子| 男宝胶囊为什么不建议吃| 客之痣是什么意思| 抗皱用什么产品好| 吃甲硝唑有什么副作用| 农历8月13日是什么星座| 大自然是什么意思| 月经推迟是什么原因| 什么水果可以减肥刮油脂| 什么是命中注定| u熊是什么意思| 健康管理是什么专业| 胃反酸是什么原因| 流鼻血去药店买什么药| 荨麻疹有什么症状| 四个横念什么| 窦性心律什么意思| 男性查hpv挂什么科| 吃了火龙果小便红色是什么原因| 为什么会突然长体癣| 在什么什么后面的英文| 胰腺炎吃什么药见效快| 天蝎座后面是什么星座| spect是什么检查| 什么日| 什么时候冬天| 145什么意思| 烂好人什么意思| 肛门塞什么东西最舒服| 瘦肉炒什么好吃| 淋巴肿了吃什么消炎药| 心静自然凉是什么意思| 脸部肿胀是什么原因| 艾灸为什么不能天天灸| 玫瑰花语是什么| 脚底褪皮是什么原因| 梦到怀孕了是什么预兆| 10万个为什么的作者| 山竹里面黄黄的是什么| 6.5号是什么星座| 百度
百度 前总统奥巴马则在社交媒体上发文称,他和夫人米歇尔深受参与游行年轻人的鼓舞。

A statistical model is a mathematical model that embodies a set of statistical assumptions concerning the generation of sample data (and similar data from a larger population). A statistical model represents, often in considerably idealized form, the data-generating process.[1] When referring specifically to probabilities, the corresponding term is probabilistic model. All statistical hypothesis tests and all statistical estimators are derived via statistical models. More generally, statistical models are part of the foundation of statistical inference. A statistical model is usually specified as a mathematical relationship between one or more random variables and other non-random variables. As such, a statistical model is "a formal representation of a theory" (Herman Adèr quoting Kenneth Bollen).[2]

Introduction

edit

Informally, a statistical model can be thought of as a statistical assumption (or set of statistical assumptions) with a certain property: that the assumption allows us to calculate the probability of any event. As an example, consider a pair of ordinary six-sided dice. We will study two different statistical assumptions about the dice.

The first statistical assumption is this: for each of the dice, the probability of each face (1, 2, 3, 4, 5, and 6) coming up is ?1/6?. From that assumption, we can calculate the probability of both dice coming up 5:? ?1/6? × ?1/6? = ?1/36?.? More generally, we can calculate the probability of any event: e.g. (1 and 2) or (3 and 3) or (5 and 6). The alternative statistical assumption is this: for each of the dice, the probability of the face 5 coming up is ?1/8? (because the dice are weighted). From that assumption, we can calculate the probability of both dice coming up 5:? ?1/8? × ?1/8? = ?1/64?.? We cannot, however, calculate the probability of any other nontrivial event, as the probabilities of the other faces are unknown.

The first statistical assumption constitutes a statistical model: because with the assumption alone, we can calculate the probability of any event. The alternative statistical assumption does not constitute a statistical model: because with the assumption alone, we cannot calculate the probability of every event. In the example above, with the first assumption, calculating the probability of an event is easy. With some other examples, though, the calculation can be difficult, or even impractical (e.g. it might require millions of years of computation). For an assumption to constitute a statistical model, such difficulty is acceptable: doing the calculation does not need to be practicable, just theoretically possible.

Formal definition

edit

In mathematical terms, a statistical model is a pair (?), where ? is the set of possible observations, i.e. the sample space, and ? is a set of probability distributions on ?.[3] The set ? represents all of the models that are considered possible. This set is typically parameterized: ?. The set ? defines the parameters of the model. If a parameterization is such that distinct parameter values give rise to distinct distributions, i.e. ? (in other words, the mapping is injective), it is said to be identifiable.[3]

In some cases, the model can be more complex.

  • In Bayesian statistics, the model is extended by adding a probability distribution over the parameter space ?.
  • A statistical model can sometimes distinguish two sets of probability distributions. The first set ? is the set of models considered for inference. The second set ? is the set of models that could have generated the data which is much larger than ?. Such statistical models are key in checking that a given procedure is robust, i.e. that it does not produce catastrophic errors when its assumptions about the data are incorrect.

An example

edit

Suppose that we have a population of children, with the ages of the children distributed uniformly, in the population. The height of a child will be stochastically related to the age: e.g. when we know that a child is of age 7, this influences the chance of the child being 1.5 meters tall. We could formalize that relationship in a linear regression model, like this: heighti?= b0?+ b1agei?+ εi, where b0 is the intercept, b1 is a parameter that age is multiplied by to obtain a prediction of height, εi is the error term, and i identifies the child. This implies that height is predicted by age, with some error.

An admissible model must be consistent with all the data points. Thus, a straight line (heighti?= b0?+ b1agei) cannot be admissible for a model of the data—unless it exactly fits all the data points, i.e. all the data points lie perfectly on the line. The error term, εi, must be included in the equation, so that the model is consistent with all the data points. To do statistical inference, we would first need to assume some probability distributions for the εi. For instance, we might assume that the εi distributions are i.i.d. Gaussian, with zero mean. In this instance, the model would have 3 parameters: b0, b1, and the variance of the Gaussian distribution. We can formally specify the model in the form (?) as follows. The sample space, ?, of our model comprises the set of all possible pairs (age, height). Each possible value of ??= (b0, b1, σ2) determines a distribution on ?; denote that distribution by ?. If ? is the set of all possible values of ?, then ?. (The parameterization is identifiable, and this is easy to check.)

In this example, the model is determined by (1) specifying ? and (2) making some assumptions relevant to ?. There are two assumptions: that height can be approximated by a linear function of age; that errors in the approximation are distributed as i.i.d. Gaussian. The assumptions are sufficient to specify ?—as they are required to do.

General remarks

edit

A statistical model is a special class of mathematical model. What distinguishes a statistical model from other mathematical models is that a statistical model is non-deterministic. Thus, in a statistical model specified via mathematical equations, some of the variables do not have specific values, but instead have probability distributions; i.e. some of the variables are stochastic. In the above example with children's heights, ε is a stochastic variable; without that stochastic variable, the model would be deterministic. Statistical models are often used even when the data-generating process being modeled is deterministic. For instance, coin tossing is, in principle, a deterministic process; yet it is commonly modeled as stochastic (via a Bernoulli process). Choosing an appropriate statistical model to represent a given data-generating process is sometimes extremely difficult, and may require knowledge of both the process and relevant statistical analyses. Relatedly, the statistician Sir David Cox has said, "How [the] translation from subject-matter problem to statistical model is done is often the most critical part of an analysis".[4]

There are three purposes for a statistical model, according to Konishi?& Kitagawa:[5]

  1. Predictions
  2. Extraction of information
  3. Description of stochastic structures

Those three purposes are essentially the same as the three purposes indicated by Friendly?& Meyer: prediction, estimation, description.[6]

Dimension of a model

edit

Suppose that we have a statistical model (?) with ?. In notation, we write that ? where k is a positive integer (? denotes the real numbers; other sets can be used, in principle). Here, k is called the dimension of the model. The model is said to be parametric if ? has finite dimension.[citation needed] As an example, if we assume that data arise from a univariate Gaussian distribution, then we are assuming that

?.

In this example, the dimension, k, equals 2. As another example, suppose that the data consists of points (x, y) that we assume are distributed according to a straight line with i.i.d. Gaussian residuals (with zero mean): this leads to the same statistical model as was used in the example with children's heights. The dimension of the statistical model is 3: the intercept of the line, the slope of the line, and the variance of the distribution of the residuals. (Note the set of all possible lines has dimension 2, even though geometrically, a line has dimension 1.)

Although formally ? is a single parameter that has dimension k, it is sometimes regarded as comprising k separate parameters. For example, with the univariate Gaussian distribution, ? is formally a single parameter with dimension 2, but it is often regarded as comprising 2 separate parameters—the mean and the standard deviation. A statistical model is nonparametric if the parameter set ? is infinite dimensional. A statistical model is semiparametric if it has both finite-dimensional and infinite-dimensional parameters. Formally, if k is the dimension of ? and n is the number of samples, both semiparametric and nonparametric models have ? as ?. If ? as ?, then the model is semiparametric; otherwise, the model is nonparametric.

Parametric models are by far the most commonly used statistical models. Regarding semiparametric and nonparametric models, Sir David Cox has said, "These typically involve fewer assumptions of structure and distributional form but usually contain strong assumptions about independencies".[7]

Nested models

edit

Two statistical models are nested if the first model can be transformed into the second model by imposing constraints on the parameters of the first model. As an example, the set of all Gaussian distributions has, nested within it, the set of zero-mean Gaussian distributions: we constrain the mean in the set of all Gaussian distributions to get the zero-mean distributions. As a second example, the quadratic model

y?= b0?+ b1x?+ b2x2?+ ε,? ? ε?~ ??(0, σ2)

has, nested within it, the linear model

y?= b0?+ b1x?+ ε,? ? ε?~ ??(0, σ2)

—we constrain the parameter b2 to equal 0.

In both those examples, the first model has a higher dimension than the second model (for the first example, the zero-mean model has dimension?1). Such is often, but not always, the case. As an example where they have the same dimension, the set of positive-mean Gaussian distributions is nested within the set of all Gaussian distributions; they both have dimension 2.

Comparing models

edit

Comparing statistical models is fundamental for much of statistical inference. Konishi & Kitagawa (2008, p.?75) state: "The majority of the problems in statistical inference can be considered to be problems related to statistical modeling. They are typically formulated as comparisons of several statistical models." Common criteria for comparing models include the following: R2, Bayes factor, Akaike information criterion, and the likelihood-ratio test together with its generalization, the relative likelihood.

Another way of comparing two statistical models is through the notion of deficiency introduced by Lucien Le Cam.[8]

See also

edit

Notes

edit
  1. ^ Cox 2006, p.?178
  2. ^ Adèr 2008, p.?280
  3. ^ a b McCullagh 2002
  4. ^ Cox 2006, p.?197
  5. ^ Konishi & Kitagawa 2008, §1.1
  6. ^ Friendly & Meyer 2016, §11.6
  7. ^ Cox 2006, p.?2
  8. ^ Le Cam, Lucien (1964). "Sufficiency and Approximate Sufficiency". Annals of Mathematical Statistics. 35 (4). Institute of Mathematical Statistics: 1429. doi:10.1214/aoms/1177700372.

References

edit

Further reading

edit
中药液是什么药 今天过生日是什么星座 哨兵是什么意思 在农村做什么生意好 月经前腰疼的厉害是什么原因
葛仙米是什么 吃胎盘有什么好处 百字五行属什么 辽宁舰舰长是什么军衔 北阳台适合种什么植物
口腔溃疡吃什么水果好 茶麸是什么东西 猫头鹰喜欢吃什么 apm是什么 带状疱疹不能吃什么食物
欺骗餐是什么意思 3.23是什么星座 言外之意是什么意思 96年属什么的 怀孕几天后有什么反应
世界第一大运动是什么hcv8jop0ns3r.cn 旗袍穿什么鞋子好看图hcv8jop5ns7r.cn 长期腹泻是什么病hcv8jop4ns4r.cn 月经正常颜色是什么色fenrenren.com xl是什么码hebeidezhi.com
1月27号是什么星座520myf.com 扁桃体发炎吃什么hcv9jop2ns4r.cn 什么是低密度脂蛋白胆固醇hcv9jop1ns8r.cn 结膜出血用什么眼药水hcv9jop8ns0r.cn 房产证和土地证有什么区别hcv9jop6ns5r.cn
缺锌吃什么hcv7jop5ns3r.cn 仪字五行属什么hcv9jop5ns6r.cn 什么的微风填空hcv7jop4ns5r.cn 更年期燥热吃什么食物hcv9jop5ns4r.cn 脑子萎缩是什么原因造成的1949doufunao.com
牛栏坑肉桂属于什么茶hcv9jop6ns3r.cn 什么是瞬时速度hcv8jop8ns0r.cn 炒牛肉用什么配菜hcv9jop8ns0r.cn 卵泡刺激素高说明什么hcv8jop3ns7r.cn 什么是借读生hcv8jop6ns3r.cn
百度