一片冰心在玉壶是什么意思| 化疗后骨髓抑制是什么意思| 吃什么可以快速减肥| 电信积分有什么用| 眼睛干涩是什么原因| 克拉霉素主治什么病| 每天吃一个鸡蛋有什么好处| 做tct检查前要注意什么| design是什么品牌| 生长因子是什么东西| 内秀是什么意思| 腿血栓什么症状| 脚气吃什么维生素| 鹅蛋脸适合什么发型| 乙酰氨基葡萄糖苷酶阳性什么意思| 球蛋白高是什么原因| 绍兴酒是什么酒| 阴部痒痒的是什么原因| 煎熬是什么意思| 妲己是什么意思| 缺磷吃什么食物好| 治妇科炎症用什么药好| 纵什么意思| slay是什么意思| 肚子咕噜咕噜响是什么原因| 11月9日什么星座| 无创什么时候出结果| g750是什么金| 发福是什么意思| 绿主是什么意思| 床上什么虫子夜间咬人| 十余年是什么意思| 胃底腺息肉是什么意思| cu是什么意思| 肌酐清除率是什么意思| 血糖高不能吃什么食物| 耳膜穿孔吃什么长得快| 甲状腺结节有什么症状表现| 什么是生物制剂药| 干贝是什么东西做的| 肝多发小囊肿什么意思| 生化是什么| 女性喝什么利尿最快| faye是什么意思| 脑瘫是什么意思| 勾魂是什么意思| 一直打嗝吃什么药| 吃什么能提高免疫力| 过山风是什么蛇| 12月26是什么星座| 活菩萨是什么意思| knife是什么意思| 现役是什么意思| 吃什么能帮助睡眠| 月经来了不能吃什么东西| 冬至说什么祝福语| 冰箱什么牌子好又省电质量又好| cr医学上是什么意思| 雉是什么动物| 6月9日什么星座| 新加坡属于什么气候| 马蹄是什么| 睾酮高有什么影响| 榴莲有什么营养价值| 真心话大冒险问什么| 四五月份是什么星座| 吃什么补白细胞最快| 高材生是什么意思| 见利忘义是什么生肖| 放风筝是什么季节| 重庆房价为什么这么低| 甘草泡水喝有什么功效| 滇是什么意思| 被电击后身体会有什么后遗症| 肠易激综合症吃什么药| 小孩老是眨眼睛是什么原因| 什么样的头发| 生孩子送什么花比较好| 什么面朝天| 发烧骨头疼是什么原因| 什么宽带网速快又便宜| 10月20日什么星座| 屁多屁臭是什么原因| 银环蛇咬伤后什么症状| ac是什么元素| 红参和高丽参有什么区别| 偏光眼镜是什么意思| 梅毒螺旋体抗体阴性是什么意思| 月经来了有血块是什么原因| 视觉感受器是什么| 洗衣机启动不了是什么原因| 疣是什么病| 吃什么减肥最好最快| 糖化血糖是什么意思| 天条是什么意思| 过期红酒有什么用途| 锆石是什么| 王八是什么字| 本命年犯太岁什么意思| 小三阳吃什么食物好得快| 射的快吃什么药| 肝硬化有什么症状| 处女座男和什么座最配对| 舌下含服是什么意思| 却的偏旁叫什么| 脚腕肿是什么原因| 受惊吓吃什么药| 天然气主要成分是什么| 北京属于什么气候| 防晒衣什么材质最防晒| 和包是什么| 流连忘返是什么生肖| yonex是什么品牌| 江米和糯米有什么区别| 双氧水是什么东西| 华在姓氏里读什么| 糖类抗原125偏高说明什么| 白蚁吃什么| 酥油是什么做的| 舌息心念什么| 天枢是什么意思| 流水生财是什么意思| 近视眼镜是什么镜| 九月三日是什么日子| 乳腺点状钙化是什么意思| 芥末油是什么提炼出来的| 晚上尿床是什么原因| 面基是什么意思啊| 乙肝表面抗体弱阳性是什么意思| 老师家访需要准备什么| 为什么会血压高| 左心室舒张功能减退是什么意思| 血液粘稠是什么原因| 血虚吃什么好| 土土念什么| 女人生气容易得什么病| ggo是什么意思| 一节黑一节白是什么蛇| 双是什么意思| 痛风什么药止痛最快| 五月一日是什么星座| 解酒的酶是什么酶| 相合是什么意思| 大便拉水是什么原因| 什么品牌的冰箱好| vmd是什么意思| 六六无穷是什么意思| 辅酶q10什么时候吃| 10个油是什么意思| 白色情人节什么意思| 足底筋膜炎挂什么科| 副高是什么意思| 气短吃什么药| hpv跟tct有什么区别| 水落石出开过什么生肖| 生意兴隆是什么生肖| 子宫囊肿是什么原因引起的| 手淫有什么坏处| 2020年什么年| 1月21是什么星座| 对付是什么意思| 六月二十日是什么日子| phoebe是什么意思| 乳头突然疼痛什么原因| 取保候审需要什么条件| 嘴角边长痘痘是什么原因| 毛峰是什么茶| elle是什么档次的牌子| 背痛挂什么科| 牡丹王是什么茶| 球迷是什么意思| 腱鞘炎在什么位置| 老年人吃什么奶粉好| po医学上是什么意思| 脚冰凉是什么原因| 吃榴莲对妇科病有什么好处| 吃什么食物对心脏好| 秘书是干什么的| 2333是什么意思啊| 竹字头均念什么名字| 黑眼圈挂什么科| 复古红是什么颜色| 开山鼻祖是什么意思| 神经衰弱吃什么药| 眼屎多吃什么药效果好| 九月二十是什么星座| 木耳和什么不能一起吃| 地屈孕酮片什么时候吃| 早上喝蜂蜜水有什么好处| 大年初一是什么星座| 拉肚子挂什么科室| 鼻子有臭味是什么原因| 稀盐酸是什么| 血压低压高是什么原因造成的| 小孩突然抽搐失去意识是什么原因| 手术后吃什么好| 龟头是什么| 仪态万方是什么意思| 9月14号是什么星座| 上火喝什么比较好| 全身酸痛失眠什么原因| 中药饮片是什么意思| 双肾盂分离是什么意思| 哮喘病应该注意什么| 无创是什么检查| 被舔下面什么感觉| 为什么游戏| 什么生肖怕老婆| 1977年五行属什么| 买车选什么品牌| 什么是鼻窦炎| 晶莹剔透是什么意思| 鼻炎咳嗽吃什么药| 克霉唑为什么4天一次| 潘氏试验阳性说明什么| 两个圈的皮带是什么牌子| 一什么睡莲| 嗤笑什么意思| 81什么意思| ivy是什么意思| 眼肿是什么原因引起的| 吃什么提高免疫力和增强体质| 骨折和断了有什么区别| 血红蛋白低吃什么| 腰酸是什么病的前兆| 尿道感染有什么症状| 唐筛主要检查什么| 干咳有痰是什么原因| 胰是什么器官| 苦瓜不能跟什么一起吃| 考试什么的都去死吧歌曲| 女人没经验开什么店好| 1980属什么生肖| 仓鼠夏天用什么垫料| 新生儿血糖低是什么原因| 贼不走空什么意思| 招字五行属什么| 李健是清华什么专业| 老年脑是什么病| 元阳是什么意思| 飞蚊症是什么原因引起的| 叉烧是什么肉| 小腿出汗是什么原因| 香蕉有什么好处| 马桶什么牌子好| 盆腔炎吃什么消炎药效果好| 惊悸的意思是什么| 口语化是什么意思| 指什么| 一日无书下一句是什么| 清道夫吃什么| 占位性病变是什么意思| 626什么意思| 小肚子疼挂什么科| 豹纹守宫吃什么| 美妞是什么意思| 什么是钙化| 月亏念什么| 来源朋友验证消息是什么意思| 单飞什么意思| tr是什么| 儿童登机需要什么证件| 黑胡桃色是什么颜色| 吃什么补铁| 百度

人民日报社论:让中华儿女共享幸福和荣光

(Redirected from Bayesian model)
百度 延安时期我们党以支部建设为重心,围绕组织功能加强基层组织建设,适应了复杂社会环境和主体利益多元对党组织的挑战,既提高了社会整合能力,也提高了党的领导力、凝聚力和号召力。

A Bayesian network (also known as a Bayes network, Bayes net, belief network, or decision network) is a probabilistic graphical model that represents a set of variables and their conditional dependencies via a directed acyclic graph (DAG).[1] While it is one of several forms of causal notation, causal networks are special cases of Bayesian networks. Bayesian networks are ideal for taking an event that occurred and predicting the likelihood that any one of several possible known causes was the contributing factor. For example, a Bayesian network could represent the probabilistic relationships between diseases and symptoms. Given symptoms, the network can be used to compute the probabilities of the presence of various diseases.

Efficient algorithms can perform inference and learning in Bayesian networks. Bayesian networks that model sequences of variables (e.g. speech signals or protein sequences) are called dynamic Bayesian networks. Generalizations of Bayesian networks that can represent and solve decision problems under uncertainty are called influence diagrams.

Graphical model

edit

Formally, Bayesian networks are directed acyclic graphs (DAGs) whose nodes represent variables in the Bayesian sense: they may be observable quantities, latent variables, unknown parameters or hypotheses. Each edge represents a direct conditional dependency. Any pair of nodes that are not connected (i.e. no path connects one node to the other) represent variables that are conditionally independent of each other. Each node is associated with a probability function that takes, as input, a particular set of values for the node's parent variables, and gives (as output) the probability (or probability distribution, if applicable) of the variable represented by the node. For example, if ? parent nodes represent ? Boolean variables, then the probability function could be represented by a table of ? entries, one entry for each of the ? possible parent combinations. Similar ideas may be applied to undirected, and possibly cyclic, graphs such as Markov networks.

Example

edit
?
A simple Bayesian network with conditional probability tables

Suppose we want to model the dependencies between three variables: the sprinkler (or more appropriately, its state - whether it is on or not), the presence or absence of rain and whether the grass is wet or not. Observe that two events can cause the grass to become wet: an active sprinkler or rain. Rain has a direct effect on the use of the sprinkler (namely that when it rains, the sprinkler usually is not active). This situation can be modeled with a Bayesian network (shown to the right). Each variable has two possible values, T (for true) and F (for false).

The joint probability function is, by the chain rule of probability,

?

where G = "Grass wet (true/false)", S = "Sprinkler turned on (true/false)", and R = "Raining (true/false)".

The model can answer questions about the presence of a cause given the presence of an effect (so-called inverse probability) like "What is the probability that it is raining, given the grass is wet?" by using the conditional probability formula and summing over all nuisance variables:

?

Using the expansion for the joint probability function ? and the conditional probabilities from the conditional probability tables (CPTs) stated in the diagram, one can evaluate each term in the sums in the numerator and denominator. For example,

?

Then the numerical results (subscripted by the associated variable values) are

?

To answer an interventional question, such as "What is the probability that it would rain, given that we wet the grass?" the answer is governed by the post-intervention joint distribution function

?

obtained by removing the factor ? from the pre-intervention distribution. The do operator forces the value of G to be true. The probability of rain is unaffected by the action:

?

To predict the impact of turning the sprinkler on:

?

with the term ? removed, showing that the action affects the grass but not the rain.

These predictions may not be feasible given unobserved variables, as in most policy evaluation problems. The effect of the action ? can still be predicted, however, whenever the back-door criterion is satisfied.[2][3] It states that, if a set Z of nodes can be observed that d-separates[4] (or blocks) all back-door paths from X to Y then

?

A back-door path is one that ends with an arrow into X. Sets that satisfy the back-door criterion are called "sufficient" or "admissible." For example, the set Z?=?R is admissible for predicting the effect of S?=?T on G, because R d-separates the (only) back-door path S?←?R?→?G. However, if S is not observed, no other set d-separates this path and the effect of turning the sprinkler on (S?=?T) on the grass (G) cannot be predicted from passive observations. In that case P(G?|?do(S?=?T)) is not "identified". This reflects the fact that, lacking interventional data, the observed dependence between S and G is due to a causal connection or is spurious (apparent dependence arising from a common cause, R). (see Simpson's paradox)

To determine whether a causal relation is identified from an arbitrary Bayesian network with unobserved variables, one can use the three rules of "do-calculus"[2][5] and test whether all do terms can be removed from the expression of that relation, thus confirming that the desired quantity is estimable from frequency data.[6]

Using a Bayesian network can save considerable amounts of memory over exhaustive probability tables, if the dependencies in the joint distribution are sparse. For example, a naive way of storing the conditional probabilities of 10 two-valued variables as a table requires storage space for ? values. If no variable's local distribution depends on more than three parent variables, the Bayesian network representation stores at most ? values.

One advantage of Bayesian networks is that it is intuitively easier for a human to understand (a sparse set of) direct dependencies and local distributions than complete joint distributions.

Inference and learning

edit

Bayesian networks perform three main inference tasks:

Inferring unobserved variables

edit

Because a Bayesian network is a complete model for its variables and their relationships, it can be used to answer probabilistic queries about them. For example, the network can be used to update knowledge of the state of a subset of variables when other variables (the evidence variables) are observed. This process of computing the posterior distribution of variables given evidence is called probabilistic inference. The posterior gives a universal sufficient statistic for detection applications, when choosing values for the variable subset that minimize some expected loss function, for instance the probability of decision error. A Bayesian network can thus be considered a mechanism for automatically applying Bayes' theorem to complex problems.

The most common exact inference methods are: variable elimination, which eliminates (by integration or summation) the non-observed non-query variables one by one by distributing the sum over the product; clique tree propagation, which caches the computation so that many variables can be queried at one time and new evidence can be propagated quickly; and recursive conditioning and AND/OR search, which allow for a space–time tradeoff and match the efficiency of variable elimination when enough space is used. All of these methods have complexity that is exponential in the network's treewidth. The most common approximate inference algorithms are importance sampling, stochastic MCMC simulation, mini-bucket elimination, loopy belief propagation, generalized belief propagation and variational methods.

Parameter learning

edit

In order to fully specify the Bayesian network and thus fully represent the joint probability distribution, it is necessary to specify for each node X the probability distribution for X conditional upon X's parents. The distribution of X conditional upon its parents may have any form. It is common to work with discrete or Gaussian distributions since that simplifies calculations. Sometimes only constraints on distribution are known; one can then use the principle of maximum entropy to determine a single distribution, the one with the greatest entropy given the constraints. (Analogously, in the specific context of a dynamic Bayesian network, the conditional distribution for the hidden state's temporal evolution is commonly specified to maximize the entropy rate of the implied stochastic process.)

Often these conditional distributions include parameters that are unknown and must be estimated from data, e.g., via the maximum likelihood approach. Direct maximization of the likelihood (or of the posterior probability) is often complex given unobserved variables. A classical approach to this problem is the expectation-maximization algorithm, which alternates computing expected values of the unobserved variables conditional on observed data, with maximizing the complete likelihood (or posterior) assuming that previously computed expected values are correct. Under mild regularity conditions, this process converges on maximum likelihood (or maximum posterior) values for parameters.

A more fully Bayesian approach to parameters is to treat them as additional unobserved variables and to compute a full posterior distribution over all nodes conditional upon observed data, then to integrate out the parameters. This approach can be expensive and lead to large dimension models, making classical parameter-setting approaches more tractable.

Structure learning

edit

In the simplest case, a Bayesian network is specified by an expert and is then used to perform inference. In other applications, the task of defining the network is too complex for humans. In this case, the network structure and the parameters of the local distributions must be learned from data.

Automatically learning the graph structure of a Bayesian network (BN) is a challenge pursued within machine learning. The basic idea goes back to a recovery algorithm developed by Rebane and Pearl[7] and rests on the distinction between the three possible patterns allowed in a 3-node DAG:

Junction patterns
Pattern Model
Chain ?
Fork ?
Collider ?

The first 2 represent the same dependencies (? and ? are independent given ?) and are, therefore, indistinguishable. The collider, however, can be uniquely identified, since ? and ? are marginally independent and all other pairs are dependent. Thus, while the skeletons (the graphs stripped of arrows) of these three triplets are identical, the directionality of the arrows is partially identifiable. The same distinction applies when ? and ? have common parents, except that one must first condition on those parents. Algorithms have been developed to systematically determine the skeleton of the underlying graph and, then, orient all arrows whose directionality is dictated by the conditional independences observed.[2][8][9][10]

An alternative method of structural learning uses optimization-based search. It requires a scoring function and a search strategy. A common scoring function is posterior probability of the structure given the training data, like the BIC or the BDeu. The time requirement of an exhaustive search returning a structure that maximizes the score is superexponential in the number of variables. A local search strategy makes incremental changes aimed at improving the score of the structure. A global search algorithm like Markov chain Monte Carlo can avoid getting trapped in local minima. Friedman et al.[11][12] discuss using mutual information between variables and finding a structure that maximizes this. They do this by restricting the parent candidate set to k nodes and exhaustively searching therein.

A particularly fast method for exact BN learning is to cast the problem as an optimization problem, and solve it using integer programming. Acyclicity constraints are added to the integer program (IP) during solving in the form of cutting planes.[13] Such method can handle problems with up to 100 variables.

In order to deal with problems with thousands of variables, a different approach is necessary. One is to first sample one ordering, and then find the optimal BN structure with respect to that ordering. This implies working on the search space of the possible orderings, which is convenient as it is smaller than the space of network structures. Multiple orderings are then sampled and evaluated. This method has been proven to be the best available in literature when the number of variables is huge.[14]

Another method consists of focusing on the sub-class of decomposable models, for which the MLE have a closed form. It is then possible to discover a consistent structure for hundreds of variables.[15]

Learning Bayesian networks with bounded treewidth is necessary to allow exact, tractable inference, since the worst-case inference complexity is exponential in the treewidth k (under the exponential time hypothesis). Yet, as a global property of the graph, it considerably increases the difficulty of the learning process. In this context it is possible to use K-tree for effective learning.[16]

Statistical introduction

edit

Given data ? and parameter ?, a simple Bayesian analysis starts with a prior probability (prior) ? and likelihood ? to compute a posterior probability ?.

Often the prior on ? depends in turn on other parameters ? that are not mentioned in the likelihood. So, the prior ? must be replaced by a likelihood ?, and a prior ? on the newly introduced parameters ? is required, resulting in a posterior probability

?

This is the simplest example of a hierarchical Bayes model.

The process may be repeated; for example, the parameters ? may depend in turn on additional parameters ?, which require their own prior. Eventually the process must terminate, with priors that do not depend on unmentioned parameters.

Introductory examples

edit

Given the measured quantities ?each with normally distributed errors of known standard deviation ?,

?

Suppose we are interested in estimating the ?. An approach would be to estimate the ? using a maximum likelihood approach; since the observations are independent, the likelihood factorizes and the maximum likelihood estimate is simply

?

However, if the quantities are related, so that for example the individual ?have themselves been drawn from an underlying distribution, then this relationship destroys the independence and suggests a more complex model, e.g.,

?
?

with improper priors ?, ?. When ?, this is an identified model (i.e. there exists a unique solution for the model's parameters), and the posterior distributions of the individual ? will tend to move, or shrink away from the maximum likelihood estimates towards their common mean. This shrinkage is a typical behavior in hierarchical Bayes models.

Restrictions on priors

edit

Some care is needed when choosing priors in a hierarchical model, particularly on scale variables at higher levels of the hierarchy such as the variable ? in the example. The usual priors such as the Jeffreys prior often do not work, because the posterior distribution will not be normalizable and estimates made by minimizing the expected loss will be inadmissible.

Definitions and concepts

edit

Several equivalent definitions of a Bayesian network have been offered. For the following, let G = (V,E) be a directed acyclic graph (DAG) and let X = (Xv), vV be a set of random variables indexed by V.

Factorization definition

edit

X is a Bayesian network with respect to G if its joint probability density function (with respect to a product measure) can be written as a product of the individual density functions, conditional on their parent variables:[17]

?

where pa(v) is the set of parents of v (i.e. those vertices pointing directly to v via a single edge).

For any set of random variables, the probability of any member of a joint distribution can be calculated from conditional probabilities using the chain rule (given a topological ordering of X) as follows:[17]

?

Using the definition above, this can be written as:

?

The difference between the two expressions is the conditional independence of the variables from any of their non-descendants, given the values of their parent variables.

Local Markov property

edit

X is a Bayesian network with respect to G if it satisfies the local Markov property: each variable is conditionally independent of its non-descendants given its parent variables:[18]

?

where de(v) is the set of descendants and V?\?de(v) is the set of non-descendants of v.

This can be expressed in terms similar to the first definition, as

?

The set of parents is a subset of the set of non-descendants because the graph is acyclic.

Marginal independence structure

edit

In general, learning a Bayesian network from data is known to be NP-hard.[19] This is due in part to the combinatorial explosion of enumerating DAGs as the number of variables increases. Nevertheless, insights about an underlying Bayesian network can be learned from data in polynomial time by focusing on its marginal independence structure:[20] while the conditional independence statements of a distribution modeled by a Bayesian network are encoded by a DAG (according to the factorization and Markov properties above), its marginal independence statements—the conditional independence statements in which the conditioning set is empty—are encoded by a simple undirected graph with special properties such as equal intersection and independence numbers.

Developing Bayesian networks

edit

Developing a Bayesian network often begins with creating a DAG G such that X satisfies the local Markov property with respect to G. Sometimes this is a causal DAG. The conditional probability distributions of each variable given its parents in G are assessed. In many cases, in particular in the case where the variables are discrete, if the joint distribution of X is the product of these conditional distributions, then X is a Bayesian network with respect to G.[21]

Markov blanket

edit

The Markov blanket of a node is the set of nodes consisting of its parents, its children, and any other parents of its children. The Markov blanket renders the node independent of the rest of the network; the joint distribution of the variables in the Markov blanket of a node is sufficient knowledge for calculating the distribution of the node. X is a Bayesian network with respect to G if every node is conditionally independent of all other nodes in the network, given its Markov blanket.[18]

d-separation

edit

This definition can be made more general by defining the "d"-separation of two nodes, where d stands for directional.[2] We first define the "d"-separation of a trail and then we will define the "d"-separation of two nodes in terms of that.

Let P be a trail from node u to v. A trail is a loop-free, undirected (i.e. all edge directions are ignored) path between two nodes. Then P is said to be d-separated by a set of nodes Z if any of the following conditions holds:

  • P contains (but does not need to be entirely) a directed chain, ? or ?, such that the middle node m is in Z,
  • P contains a fork, ?, such that the middle node m is in Z, or
  • P contains an inverted fork (or collider), ?, such that the middle node m is not in Z and no descendant of m is in Z.

The nodes u and v are d-separated by Z if all trails between them are d-separated. If u and v are not d-separated, they are d-connected.

X is a Bayesian network with respect to G if, for any two nodes u, v:

?

where Z is a set which d-separates u and v. (The Markov blanket is the minimal set of nodes which d-separates node v from all other nodes.)

Causal networks

edit

Although Bayesian networks are often used to represent causal relationships, this need not be the case: a directed edge from u to v does not require that Xv be causally dependent on Xu. This is demonstrated by the fact that Bayesian networks on the graphs:

?

are equivalent: that is they impose exactly the same conditional independence requirements.

A causal network is a Bayesian network with the requirement that the relationships be causal. The additional semantics of causal networks specify that if a node X is actively caused to be in a given state x (an action written as do(X?=?x)), then the probability density function changes to that of the network obtained by cutting the links from the parents of X to X, and setting X to the caused value x.[2] Using these semantics, the impact of external interventions from data obtained prior to intervention can be predicted.

Inference complexity and approximation algorithms

edit

In 1990, while working at Stanford University on large bioinformatic applications, Cooper proved that exact inference in Bayesian networks is NP-hard.[22] This result prompted research on approximation algorithms with the aim of developing a tractable approximation to probabilistic inference. In 1993, Paul Dagum and Michael Luby proved two surprising results on the complexity of approximation of probabilistic inference in Bayesian networks.[23] First, they proved that no tractable deterministic algorithm can approximate probabilistic inference to within an absolute error ??<?1/2. Second, they proved that no tractable randomized algorithm can approximate probabilistic inference to within an absolute error ??<?1/2 with confidence probability greater than?1/2.

At about the same time, Roth proved that exact inference in Bayesian networks is in fact #P-complete (and thus as hard as counting the number of satisfying assignments of a conjunctive normal form formula (CNF)) and that approximate inference within a factor 2n1?? for every ? > 0, even for Bayesian networks with restricted architecture, is NP-hard.[24][25]

In practical terms, these complexity results suggested that while Bayesian networks were rich representations for AI and machine learning applications, their use in large real-world applications would need to be tempered by either topological structural constraints, such as na?ve Bayes networks, or by restrictions on the conditional probabilities. The bounded variance algorithm[26] developed by Dagum and Luby was the first provable fast approximation algorithm to efficiently approximate probabilistic inference in Bayesian networks with guarantees on the error approximation. This powerful algorithm required the minor restriction on the conditional probabilities of the Bayesian network to be bounded away from zero and one by ? where ? was any polynomial of the number of nodes in the network, ?.

Software

edit

Notable software for Bayesian networks include:

  • Just another Gibbs sampler (JAGS) – Open-source alternative to WinBUGS. Uses Gibbs sampling.
  • OpenBUGS – Open-source development of WinBUGS.
  • SPSS Modeler – Commercial software that includes an implementation for Bayesian networks.
  • Stan (software) – Stan is an open-source package for obtaining Bayesian inference using the No-U-Turn sampler (NUTS),[27] a variant of Hamiltonian Monte Carlo.
  • PyMC – A Python library implementing an embedded domain specific language to represent bayesian networks, and a variety of samplers (including NUTS)
  • WinBUGS – One of the first computational implementations of MCMC samplers. No longer maintained.

History

edit

The term Bayesian network was coined by Judea Pearl in 1985 to emphasize:[28]

  • the often subjective nature of the input information
  • the reliance on Bayes' conditioning as the basis for updating information
  • the distinction between causal and evidential modes of reasoning[29]

In the late 1980s Pearl's Probabilistic Reasoning in Intelligent Systems[30] and Neapolitan's Probabilistic Reasoning in Expert Systems[31] summarized their properties and established them as a field of study.

See also

edit

Notes

edit
  1. ^ Ruggeri, Fabrizio; Kenett, Ron S.; Faltin, Frederick W., eds. (2025-08-14). Encyclopedia of Statistics in Quality and Reliability (1?ed.). Wiley. p.?1. doi:10.1002/9780470061572.eqr089. ISBN?978-0-470-01861-3.
  2. ^ a b c d e Pearl, Judea (2000). Causality: Models, Reasoning, and Inference. Cambridge University Press. ISBN?978-0-521-77362-1. OCLC?42291253.
  3. ^ "The Back-Door Criterion" (PDF). Retrieved 2025-08-14.
  4. ^ "d-Separation without Tears" (PDF). Retrieved 2025-08-14.
  5. ^ Pearl J (1994). "A Probabilistic Calculus of Actions". In Lopez de Mantaras R, Poole D (eds.). UAI'94 Proceedings of the Tenth international conference on Uncertainty in artificial intelligence. San Mateo CA: Morgan Kaufmann. pp.?454–462. arXiv:1302.6835. Bibcode:2013arXiv1302.6835P. ISBN?1-55860-332-8.
  6. ^ Shpitser I, Pearl J (2006). "Identification of Conditional Interventional Distributions". In Dechter R, Richardson TS (eds.). Proceedings of the Twenty-Second Conference on Uncertainty in Artificial Intelligence. Corvallis, OR: AUAI Press. pp.?437–444. arXiv:1206.6876.
  7. ^ Rebane G, Pearl J (1987). "The Recovery of Causal Poly-trees from Statistical Data". Proceedings, 3rd Workshop on Uncertainty in AI. Seattle, WA. pp.?222–228. arXiv:1304.2736.{{cite book}}: CS1 maint: location missing publisher (link)
  8. ^ Spirtes P, Glymour C (1991). "An algorithm for fast recovery of sparse causal graphs" (PDF). Social Science Computer Review. 9 (1): 62–72. CiteSeerX?10.1.1.650.2922. doi:10.1177/089443939100900106. S2CID?38398322.
  9. ^ Spirtes P, Glymour CN, Scheines R (1993). Causation, Prediction, and Search (1st?ed.). Springer-Verlag. ISBN?978-0-387-97979-3.
  10. ^ Verma T, Pearl J (1991). "Equivalence and synthesis of causal models". In Bonissone P, Henrion M, Kanal LN, Lemmer JF (eds.). UAI '90 Proceedings of the Sixth Annual Conference on Uncertainty in Artificial Intelligence. Elsevier. pp.?255–270. ISBN?0-444-89264-8.
  11. ^ Friedman N, Geiger D, Goldszmidt M (November 1997). "Bayesian Network Classifiers". Machine Learning. 29 (2–3): 131–163. doi:10.1023/A:1007465528199.
  12. ^ Friedman N, Linial M, Nachman I, Pe'er D (August 2000). "Using Bayesian networks to analyze expression data". Journal of Computational Biology. 7 (3–4): 601–20. CiteSeerX?10.1.1.191.139. doi:10.1089/106652700750050961. PMID?11108481.
  13. ^ Cussens J (2011). "Bayesian network learning with cutting planes" (PDF). Proceedings of the 27th Conference Annual Conference on Uncertainty in Artificial Intelligence: 153–160. arXiv:1202.3713. Bibcode:2012arXiv1202.3713C. Archived from the original on March 27, 2022.
  14. ^ Scanagatta M, de Campos CP, Corani G, Zaffalon M (2015). "Learning Bayesian Networks with Thousands of Variables". NIPS-15: Advances in Neural Information Processing Systems. Vol.?28. Curran Associates. pp.?1855–1863.
  15. ^ Petitjean F, Webb GI, Nicholson AE (2013). Scaling log-linear analysis to high-dimensional data (PDF). International Conference on Data Mining. Dallas, TX, USA: IEEE.
  16. ^ M. Scanagatta, G. Corani, C. P. de Campos, and M. Zaffalon. Learning Treewidth-Bounded Bayesian Networks with Thousands of Variables. In NIPS-16: Advances in Neural Information Processing Systems 29, 2016.
  17. ^ a b Russell & Norvig 2003, p.?496.
  18. ^ a b Russell & Norvig 2003, p.?499.
  19. ^ Chickering, David M.; Heckerman, David; Meek, Christopher (2004). "Large-sample learning of Bayesian networks is NP-hard" (PDF). Journal of Machine Learning Research. 5: 1287–1330.
  20. ^ Deligeorgaki, Danai; Markham, Alex; Misra, Pratik; Solus, Liam (2023). "Combinatorial and algebraic perspectives on the marginal independence structure of Bayesian networks". Algebraic Statistics. 14 (2): 233–286. arXiv:2210.00822. doi:10.2140/astat.2023.14.233.
  21. ^ Neapolitan RE (2004). Learning Bayesian networks. Prentice Hall. ISBN?978-0-13-012534-7.
  22. ^ Cooper GF (1990). "The Computational Complexity of Probabilistic Inference Using Bayesian Belief Networks" (PDF). Artificial Intelligence. 42 (2–3): 393–405. doi:10.1016/0004-3702(90)90060-d. S2CID?43363498.
  23. ^ Dagum P, Luby M (1993). "Approximating probabilistic inference in Bayesian belief networks is NP-hard". Artificial Intelligence. 60 (1): 141–153. CiteSeerX?10.1.1.333.1586. doi:10.1016/0004-3702(93)90036-b.
  24. ^ D. Roth, On the hardness of approximate reasoning, IJCAI (1993)
  25. ^ D. Roth, On the hardness of approximate reasoning, Artificial Intelligence (1996)
  26. ^ Dagum P, Luby M (1997). "An optimal approximation algorithm for Bayesian inference". Artificial Intelligence. 93 (1–2): 1–27. CiteSeerX?10.1.1.36.7946. doi:10.1016/s0004-3702(97)00013-1. Archived from the original on 2025-08-14. Retrieved 2025-08-14.
  27. ^ Hoffman, Matthew D.; Gelman, Andrew (2011). "The No-U-Turn Sampler: Adaptively Setting Path Lengths in Hamiltonian Monte Carlo". arXiv:1111.4246 [stat.CO].
  28. ^ Pearl J (1985). Bayesian Networks: A Model of Self-Activated Memory for Evidential Reasoning (UCLA Technical Report CSD-850017). Proceedings of the 7th Conference of the Cognitive Science Society, University of California, Irvine, CA. pp.?329–334. Retrieved 2025-08-14.
  29. ^ Bayes T, Price (1763). "An Essay Towards Solving a Problem in the Doctrine of Chances". Philosophical Transactions of the Royal Society. 53: 370–418. doi:10.1098/rstl.1763.0053.
  30. ^ Pearl J (2025-08-14). Probabilistic Reasoning in Intelligent Systems. San Francisco CA: Morgan Kaufmann. p.?1988. ISBN?978-1-55860-479-7.
  31. ^ Neapolitan RE (1989). Probabilistic reasoning in expert systems: theory and algorithms. Wiley. ISBN?978-0-471-61840-9.

References

edit
An earlier version appears as, Microsoft Research, March 1, 1995. The paper is about both parameter and structure learning in Bayesian networks.

Further reading

edit
edit
安装空调需要注意什么 转述句什么意思 天使轮是什么意思 1912年属什么生肖 怀孕做梦梦到蛇是什么意思
乌龟死了有什么预兆 大肠在人体什么位置图 什么是宫缩 青鹏软膏主要治疗什么 黄体期什么意思
50米7秒什么水平 桑葚泡水喝有什么好处 什么是原生家庭 明媚是什么意思 灰溜溜是什么意思
孕妇做唐筛是检查什么 烈日灼心什么意思 一什么秧苗 甘薯和红薯有什么区别 数位板是什么
乳腺发炎有什么症状hcv7jop5ns3r.cn 备孕期间要注意什么hcv9jop2ns5r.cn 蜂王浆什么味道hcv9jop6ns5r.cn 火箭是干什么用的hcv8jop4ns9r.cn 结石吃什么好hcv8jop1ns1r.cn
七月份有什么水果dayuxmw.com 男人阴茎硬不起来是什么原因hcv9jop1ns4r.cn 肌酐高有什么危害hcv8jop9ns2r.cn 暖宫贴贴在什么位置hcv9jop6ns5r.cn 咽拭子是检查什么的hcv8jop8ns0r.cn
碳水是什么hcv8jop0ns5r.cn 34岁属什么的生肖hcv8jop1ns8r.cn 沙中土是什么生肖imcecn.com 即视感是什么意思hcv9jop0ns6r.cn 4月16日什么星座hcv9jop6ns3r.cn
心跳慢吃什么药hcv8jop4ns8r.cn 脑梗有什么症状前兆bjhyzcsm.com 丹参长什么样子图片xinjiangjialails.com 2月2日什么星座onlinewuye.com bkg是什么意思hcv9jop7ns3r.cn
百度