<?xml version='1.0' encoding='UTF-8'?><?xml-stylesheet href="http://www.blogger.com/styles/atom.css" type="text/css"?><feed xmlns='http://www.w3.org/2005/Atom' xmlns:openSearch='http://a9.com/-/spec/opensearchrss/1.0/' xmlns:georss='http://www.georss.org/georss' xmlns:gd='http://schemas.google.com/g/2005' xmlns:thr='http://purl.org/syndication/thread/1.0'><id>tag:blogger.com,1999:blog-18292881</id><updated>2012-02-13T22:32:27.019-05:00</updated><category term='ontologies'/><category term='Russia'/><category term='OWL'/><category term='research'/><category term='pronto'/><category term='Pavel on Web'/><title type='text'>Pavel's blog</title><subtitle type='html'></subtitle><link rel='http://schemas.google.com/g/2005#feed' type='application/atom+xml' href='http://klinov.blogspot.com/feeds/posts/default'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/18292881/posts/default?max-results=100'/><link rel='alternate' type='text/html' href='http://klinov.blogspot.com/'/><link rel='hub' href='http://pubsubhubbub.appspot.com/'/><author><name>Pavel Klinov</name><uri>http://www.blogger.com/profile/03151624478275452894</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><generator version='7.00' uri='http://www.blogger.com'>Blogger</generator><openSearch:totalResults>28</openSearch:totalResults><openSearch:startIndex>1</openSearch:startIndex><openSearch:itemsPerPage>100</openSearch:itemsPerPage><entry><id>tag:blogger.com,1999:blog-18292881.post-1240952461422965557</id><published>2010-11-09T18:49:00.004-05:00</published><updated>2010-11-09T19:13:21.659-05:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='OWL'/><category scheme='http://www.blogger.com/atom/ns#' term='ontologies'/><category scheme='http://www.blogger.com/atom/ns#' term='Russia'/><title type='text'>Interest in OWL in Russia growing</title><content type='html'>It always felt quite frustrating that interest in OWL/SemWeb technologies in Russia (or even in the former Soviet Union) was very modest, to say the least. &lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Fortunately, the situation seems to be changing. I'm glad to learn about few projects that involve building OWL ontologies. For example, Dr. Ivan Martynikhin - a young scientist from the St. Petersburg Medical University (&lt;a href="http://spbmu.s-psy.ru/"&gt;Psychiatry and Narcology Chair&lt;/a&gt;) - is trying to build an ontology of psychosis. The aim is to create an ontology that in short term can back up various e-learning materials (that's the Sophia project), and, in long term, future diagnosing or EMR systems.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Another example (which I came across just today) is the group of researchers at the &lt;a href="http://www.sgm.ru/"&gt;Geology Museum (RAS research center)&lt;/a&gt; who run the &lt;a href="https://sites.google.com/site/alex0shkotin/formal-geology"&gt;Formal Geology&lt;/a&gt; project. The project involves a &lt;a href="http://owl.cs.manchester.ac.uk/browser/manage/?action=load&amp;amp;clear=true&amp;amp;uri=http://earth.jscc.ru/ontologies/dic.owl" target="blank"&gt;geology ontology&lt;/a&gt;, some steps towards moving from a relational database of geological facts to an ontology-based representation, and an igneous rock classification algorithm. Docs are all in Russian (shame!) but that's fixable.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Exciting, eh? We should be thinking of organizing events to bring all these great people together. It's quite a challenge to get them to publish at OWLED or similar international workshops (for all sorts of reasons).&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Finally, it'd be a remiss to not mention the &lt;a href="http://semanticfuture.net/"&gt;semanticfuture.net&lt;/a&gt; founders who provided a perfect opportunity for these people to share their interests and attract attention.&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/18292881-1240952461422965557?l=klinov.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://klinov.blogspot.com/feeds/1240952461422965557/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=18292881&amp;postID=1240952461422965557' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/18292881/posts/default/1240952461422965557'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/18292881/posts/default/1240952461422965557'/><link rel='alternate' type='text/html' href='http://klinov.blogspot.com/2010/11/interest-in-owl-in-russia-growing.html' title='Interest in OWL in Russia growing'/><author><name>Pavel Klinov</name><uri>http://www.blogger.com/profile/03151624478275452894</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-18292881.post-8511919448062348706</id><published>2010-02-27T12:18:00.006-05:00</published><updated>2010-02-27T13:13:24.160-05:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='pronto'/><category scheme='http://www.blogger.com/atom/ns#' term='research'/><title type='text'>What's been happening in the last two years... in a nutshell!</title><content type='html'>Well, yeah, it eventually became crystal clear that either I get responsible and keep the blog updated or I should simply delete it. The latter option is a lot easier but I thought that yet another attempt won't hurt. Or will it?&lt;div&gt;I guess it kinda makes sense to try to summarise what I've been doing for the last two years. I'm not expecting anyone to seriously read this (since the only people who are supposed to be interested are my advisers but they know all that stuff anyway) but it may help me sorta look back and re-assess my efforts.&lt;/div&gt;&lt;div&gt;So in Summer 2008 it became fairly obvious that we needed a principally new approach to make Pronto usable (I'm not using the word *scalable* here). At that time Pronto was able to deal with 15-18 probabilistic statements. There was no hope to handle more because the linear systems constructed in the process of reasoning were *always* exponential. So it wasn't a problem of solving them, it was a problem of even representing them in memory. All attempts to shrink their size appeared futile.&lt;/div&gt;&lt;div&gt;But then we came across column generation. Its idea is that one does not need the entire linear system to be computed before solving it. Simply put, you get some subset of variables, solve the restricted system, check if it's optimal (i.e. if the solution is the same as would be for the full system) and if not - you generate and add new columns. Otherwise you stop. Obviously this may work iff there's a hope of stopping early, i.e. before you generate an exponential number of variables. Fortunately, the theory of LP gives us that hope: any optimal solution contains only a polynomial number of non-zero variables. Since a variable being zero is the same as non-existent variable, we only need to generate a polynomial number of variables. In the ideal world.&lt;/div&gt;&lt;div&gt;In the real world there're two problems. First, you don't know which variables you need. Second, it's hard to generate those columns which are good (or are likely to be good). There's a whole theory of column generation talking about how to solve these two problems. We'll call the first problem The Convergence Problem (CP) and the second - The Column Generation Problem (CGP). One thing to notice here is that they are related: the better you solve CGP (i.e. the better columns you generate), the earlier you'll stop (i.e. the convergence is faster). &lt;/div&gt;&lt;div&gt;So we started looking into column generation in Fall 2008. At first it wasn't clear what those columns (or variables) represent. They actually correspond to *possible worlds* over which probabilistic models are defined. So the CGP problem could be formulated as the problem of generating possible worlds which are likely to have non-zero probability for a given KB and the query (entailment).&lt;/div&gt;&lt;div&gt;At that point I stumbled at the first major difficulty: I needed to generate not just worlds (which are basically conjunctive class expressions over the relevant signature) but *possible* worlds or expressions that would be satisfiable given the KB. Well, a column is an array of rationals (i.e. vector of coefficients). It corresponds to a class expression.  Formulating constraints on those rationals which would ensure satisfiability of the expression is NOT easy. Their structure depends on the KB and it was unclear if they could be captured using a set of linear inequalities.&lt;/div&gt;&lt;div&gt;But we found the way to go. The basic idea was that we don't necessarily need all those constraints in advance. We could generate a column, get the corresponding class expression and  check its satisfiability. If it's not satisfiable then we could find out why (i.e. get the combination of rationals that cause the unsatisfiability) and prohibit such combination using a linear inequality. This is our hybrid "generate-n-check" column generation algorithm.&lt;/div&gt;&lt;div&gt;The second big headache was the CGP itself. It was originally formulated a constraint optimization problem (COP). The good news was that its size was polynomial in the number of probabilistic statements. The bad news was that it was still too computationally hard. Our constraint solver never got sufficiently scalable despite using all sorts of advanced algorithms (like variations of the Russian Doll Search, partitioning, etc.). Bounding in the branch-n-bound algorithm was crap. Pronto could scale up to 50 constraints or so. Still a nice improvement over 15-18 but pretty frustrating given 3-4 months that went into it. Spring 2009 looked pretty bleak.&lt;/div&gt;&lt;div&gt;And then came the break through. I'm deeply grateful to &lt;a href="https://www.msu.edu/~rubin/"&gt;Dr. Paul Rubin&lt;/a&gt; - an operation research scientist - whose invaluable help on the Google op-research board led me to the idea of finding a better formulation for the CP problem. Something easier than COP. More scalable. Something for which good algorithms and tools were available. Like MILP (mixed-integer linear programming) with all the highly optimized solvers, like CPLEX, GLPK, MOSEK, GAMS, etc-etc.&lt;/div&gt;&lt;div&gt;It felt great and weird. Great because Pronto started to fly (er.. comparatively to the previous versions). Weird, because all my COP efforts were worthless (yeah, got skills and experience but still...)&lt;/div&gt;&lt;div&gt;There was still a lot work since then. We've implemented inconsistency analysis and pinpointing algorithms which would find all minimal sets of inconsistent probabilistic statements (a non-trivial stuff). We've improved the non-monotonic entailment algorithms. We've put a lot of efforts into solving the convergence problem (which still kicks my ass, to be honest).  We've investigated theoretical aspects of P-SHIQ (our probabilistic logic) and showed its relation to the standard first-order probabilistic logic.&lt;/div&gt;&lt;div&gt;I should find time to blog on all these things. That's all for now.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/18292881-8511919448062348706?l=klinov.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://klinov.blogspot.com/feeds/8511919448062348706/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=18292881&amp;postID=8511919448062348706' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/18292881/posts/default/8511919448062348706'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/18292881/posts/default/8511919448062348706'/><link rel='alternate' type='text/html' href='http://klinov.blogspot.com/2010/02/whats-been-happening-in-last-two-years.html' title='What&apos;s been happening in the last two years... in a nutshell!'/><author><name>Pavel Klinov</name><uri>http://www.blogger.com/profile/03151624478275452894</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-18292881.post-3901229688067023828</id><published>2008-07-16T05:31:00.010-05:00</published><updated>2008-12-10T07:24:48.240-05:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='pronto'/><title type='text'>Pronto 0.2 released</title><content type='html'>As you may have noticed &lt;a href="http://clarkparsia.com/"&gt;C&amp;amp;P&lt;/a&gt; recently released the next minor version of &lt;a href="http://pellet.owldl.com/pronto"&gt;Pronto&lt;/a&gt; – our probabilistic DL reasoner built on top of Pellet – &lt;a href="http://pellet.owldl.com/download/pronto-0.2"&gt;Pronto 0.2&lt;/a&gt;. In what follows I’ll briefly summarizes the main improvements comparably to 0.1.&lt;br /&gt;First off, Pronto 0.2 – is an optimization/bug fixing release meaning that there are no reasonably important new features (at least as far end users are concerned). We have very much been concerned with performance/scalability issues which were (and some still are) rather glaring in 0.1.&lt;br /&gt;Here I must say that the success w.r.t. performance and scalability were quite different. There’re pretty substantial performance improvements, i.e. Pronto 0.2 is more predictable and few times faster than Pronto 0.1 when running on the same set of samples. Simply put, if Pronto 0.1 is capable of reasoning with some probabilistic ontology, Pronto 0.2. will do it much faster. How much faster depends on the KB.&lt;br /&gt;Regarding scalability, the progress has been less impressive meaning that it still can’t scale substantially beyond ~15 generic (TBox) conditional constraints. Being a little more technical here: it’s polynomially better than Pronto 0.1 but that doesn’t entirely solve the problem with exponential complexity of probabilistic DL reasoning. More news about the ongoing progress in that direction is expected towards the end of summer 2008.&lt;br /&gt;Now let’s turn to the nature of the performance improvements. The main is the new lexicographic entailment algorithm that replaced the original Lukasiewicz TLexEnt procedure [1] that was based on brute force search for the preferred models (recall that Pronto is a non-monotonic reasoner, so it infers knowledge from the set of preferred models of the KB, not from the set of all models). The new algorithm is based on the idea of carefully identifying and eliminating small pieces of the KB that can’t be satisfied by the preferred models. This is similar to computing maximal satisfiable (minimal unsatifiable) subsets of ontology – techniques that are used for debugging classical OWL ontologies. The algorithm helps to make reasoning a lot more predictable and robust, i.e. reduces the difference between the “easy” ontologies and “hard” ones that previously required 10-15x reasoning time (in spite of having the same size). More detailed technical description of the algorithm can soon be found in our paper that was recently accepted to ISWC’08 [2].&lt;br /&gt;There are also other optimization techniques most of which try to reduce the excessive amount of classical DL reasoning performed during the probabilistic entailment.&lt;br /&gt;Improvements (i.e. robustness and speed-up) are illustrated on the following diagram (taken from the ISWC paper). It compares the performance of Pronto 0.1 and Pronto 0.2 on a set of randomly generated probabilistic ontologies. Y axis shows the reasoning time and X axis shows the metric that we developed to assess “hardness” of specific probabilistic KBs before reasoning with them. Basically we wanted to understand what makes KBs hard for reasoning and it seemed that “connectivity”, i.e., number of all possible subsumption relations between OWL classes in probabilistic statements might matter (the higher connectivity - the easier must be the KB). The results show that in addition to simply being faster Pronto 0.2 is also more robust as there are no “hard outliers” – KBs that take a lot more reasoning time comparably to others of the same size. And it’s more predictable, i.e. by using the metric (which is fast to compute) one can roughly estimate reasoning time (if using Pronto 0.2).&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://3.bp.blogspot.com/_-Bluxyl5dy8/SH3Q4a7ey_I/AAAAAAAACEY/Mwy_RuFHwrg/s1600-h/comparison.jpg"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer;" src="http://3.bp.blogspot.com/_-Bluxyl5dy8/SH3Q4a7ey_I/AAAAAAAACEY/Mwy_RuFHwrg/s320/comparison.jpg" alt="" id="BLOGGER_PHOTO_ID_5223560810539699186" border="0" /&gt;&lt;/a&gt;I am hoping to make the evaluation results repeatable by other people. To achieve this we started a small project &lt;a href="http://www2.cs.man.ac.uk/%7Eklinovp/projects/prevaldl/index.html"&gt;PREVAL_DL&lt;/a&gt; which is a reasoner-independent tool for evaluating the performance of P-SHIQ(D) reasoners. It can generate instances of probabilistic reasoning problems and run your favorite P-SHIQ(D) reasoner through them (Pronto and &lt;a href="http://www.sts.tu-harburg.de/%7Et.naeth/#research"&gt;ContraBovemRufum &lt;/a&gt;are the only available so far).&lt;br /&gt;&lt;br /&gt;We are currently working on the scalability problem. Hopefully soon we’ll have a better idea of how large probabilistic ontologies can still be reasoned with. Scalability improvements after that point will likely be concerned with partitioning/decomposing probabilistic KBs onto smaller probabilistically independent fragments.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;[1] T. Lukasiewicz, Expressive Probabilistic Description Logics, Artificial Intelligence, 172(6-7), 852-883, April 2008&lt;br /&gt;&lt;br /&gt;[2] P. Klinov and B. Parsia, Optimization and Evaluation of Reasoning in Probabilistic Description Logic: Towards a Systematic Approach, accepted to the International Semantic Web Conference, Karlsruhe, October 2008&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/18292881-3901229688067023828?l=klinov.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://klinov.blogspot.com/feeds/3901229688067023828/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=18292881&amp;postID=3901229688067023828' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/18292881/posts/default/3901229688067023828'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/18292881/posts/default/3901229688067023828'/><link rel='alternate' type='text/html' href='http://klinov.blogspot.com/2008/07/pronto-02-released.html' title='Pronto 0.2 released'/><author><name>Pavel Klinov</name><uri>http://www.blogger.com/profile/03151624478275452894</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://3.bp.blogspot.com/_-Bluxyl5dy8/SH3Q4a7ey_I/AAAAAAAACEY/Mwy_RuFHwrg/s72-c/comparison.jpg' height='72' width='72'/><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-18292881.post-2745436834402354926</id><published>2007-11-23T23:18:00.000-05:00</published><updated>2007-11-23T23:35:22.312-05:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='pronto'/><category scheme='http://www.blogger.com/atom/ns#' term='research'/><title type='text'>Few extra features for P-DL</title><content type='html'>I'm starting using P-DL as an abbreviation for Probabilistic DL. Hope that's clear enough.&lt;br /&gt;&lt;br /&gt;In my last post I whined about the complications of adding complex probabilistic formulas to P-DL without any good justification that they are actually needed. Let me now propose something else that seems to be easier to implement and might even be more useful.&lt;br /&gt;Recall that Pronto needs numbers (intervals) for each of its conditional constraints. In some cases, however, it might make sense to express *relative* probabilistic constraints w/o any reference to numbers. For example: &lt;span style="font-style: italic;"&gt;Given the symptoms, the chance of having disease A is twice as high as the chance of having B.&lt;/span&gt; So, we may not be willing to commit to numbers (or may not even know the numbers!) but still want to express the constraint. Why not to allow this?&lt;br /&gt;This can be represented by expressions of the form:&lt;br /&gt;&lt;span style="font-size:85%;"&gt;&lt;span style="font-family:courier new;"&gt;k1*P(psi1|phi1)[l1,u1]+...+&lt;/span&gt;&lt;span style="font-family:courier new;"&gt;kn*P(psin|phin)[ln,un]&lt;/span&gt;&lt;span style="font-family:courier new;"&gt; &gt;= 0&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;The semantics is also pretty clear - we just need to impose an extra restriction on the possible probability distributions. But this is just *one* inequality so allowing such expressions is no harder than supporting constraints themselves. That's good.&lt;br /&gt;Next step - to implement it? Yes, I need to sort out nominals first.. =yawn=&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/18292881-2745436834402354926?l=klinov.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://klinov.blogspot.com/feeds/2745436834402354926/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=18292881&amp;postID=2745436834402354926' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/18292881/posts/default/2745436834402354926'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/18292881/posts/default/2745436834402354926'/><link rel='alternate' type='text/html' href='http://klinov.blogspot.com/2007/11/few-extra-features-for-p-dl.html' title='Few extra features for P-DL'/><author><name>Pavel Klinov</name><uri>http://www.blogger.com/profile/03151624478275452894</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-18292881.post-5153676073469255719</id><published>2007-11-23T23:05:00.000-05:00</published><updated>2007-11-23T23:18:10.307-05:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='pronto'/><category scheme='http://www.blogger.com/atom/ns#' term='research'/><title type='text'>Follow-up on complex formulas</title><content type='html'>In my previous post I raised few questions about allowing complex probabilistic formulas in probabilistic DL. The main issue is that neither P-SAT nor TLogEnt seem to be tractable because of no obvious translation to a single LP instance would do the job.&lt;br /&gt;It can, however, be solved by a number of LP instances. Basically, the idea (as suggested by Lukasiewicz in private communication) is to convert the formula to DNF and non-deterministically solve N LP instances (where N is the number of disjuncts).&lt;br /&gt;This doesn't look tractable. To make things even worse, some of conjuncts might be negations which means even a  single disjunct might have to be broken onto several LP instances! For example, assume we have smth like:&lt;br /&gt;(psi1|phi1)[l1,u1] \or ((psi2|phi2)[l2,u2] \and \neg{(psi1|phi1)[l3,u3]})&lt;br /&gt;The way this can be handled is to generate 3 LP instances:&lt;br /&gt;- check if LP for (psi1|phi1)[l1,u1] is solvable. If yes, then stop and answer "yes"&lt;br /&gt;- if no, check that LP for (psi2|phi2)[l2,u2] is solvable. If no, stop with "no"&lt;br /&gt;- if yes, check that LP for (psi3|phi3)[l3,u3] is solvable. If no, stop with "yes", otherwise - stop with "no".&lt;br /&gt;Horrible, isn't it? Of course, there might be heuristics as to minimize the number of LPs to be solved. In addition generated LP instances can be reused. Pronto experience shows that generation of an LP problem is a lot more expensive than to solve it.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/18292881-5153676073469255719?l=klinov.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://klinov.blogspot.com/feeds/5153676073469255719/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=18292881&amp;postID=5153676073469255719' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/18292881/posts/default/5153676073469255719'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/18292881/posts/default/5153676073469255719'/><link rel='alternate' type='text/html' href='http://klinov.blogspot.com/2007/11/follow-up-on-complex-formulas.html' title='Follow-up on complex formulas'/><author><name>Pavel Klinov</name><uri>http://www.blogger.com/profile/03151624478275452894</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-18292881.post-7988394326218176061</id><published>2007-11-01T23:10:00.000-05:00</published><updated>2007-11-01T23:48:50.811-05:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='pronto'/><category scheme='http://www.blogger.com/atom/ns#' term='research'/><title type='text'>Probabilistic Description Logic and Complex Formulas</title><content type='html'>It's been a while since I implemented the first version of Pronto and only now I realized that P-SHIN (which is a basic of Pronto) doesn't allow complex probabilistic expressions (call them formulas or axioms, assertions, whatever). What I mean by &lt;span style="font-style: italic;"&gt;complex expression&lt;/span&gt; is simply a combination of conditional constraints linked using conventional logical connectives - and, or, neg, impl, with obvious relationships. Consider the example:&lt;br /&gt;In Pronto, we can say that for some probabilistic individual, its probability of being a penguin is 90% - (Penguin|owl:Thing)[0.9;0.9] (so called PABox axiom). We can also say smth else, like (Alive|owl:Thing)[0;0.1].  It's pretty easy to observe that both are in fact equivalent to (Penguin|owl:Thing)[0.9;0.9] \and (Alive|owl:Thing)[0;0.1]  according to the obvious semantics of \and. Okay, but why shall we stop here and not allow something like: (Penguin|owl:Thing)[0.9;0.9] \or (Alive|owl:Thing)[0;0.1]? Or even (Penguin|owl:Thing)[0.9;0.9] -&gt; (Alive|owl:Thing)[0;0.1]? In other words, why not to built complex expressions from the conditional constraints similarly to how propositional formulas are built from atoms.&lt;br /&gt;Something similar was proposed by Lukasiewicz in his paper &lt;a href="http://citeseer.ist.psu.edu/306295.html"&gt;"Probabilistic Default Reasoning with Conditional Constraints"&lt;/a&gt;  where he inductively builds so called strict probabilistic formulas from strict conditional constraints. Note, that he allows only &lt;span style="font-style: italic;"&gt;strict &lt;/span&gt;constraints to participate in such expressions and not defaults (probably because it's semantically unclear what should be a conjunction/negation of defeasible statements). Okay, given that PABoxes in Pronto never have defeasible constraints, we can do the same.&lt;br /&gt;However, in his paper on Probabilistic DL &lt;a href="http://www.kr.tuwien.ac.at/research/reports/rr0605.pdf"&gt;P-SHOQ&lt;/a&gt;, Lukasiewicz did not propose that. Why? Well, there might be several reasons. The first (and the most obvious) is that it's not quite clear if satisfiablity of a probabilistic default theory *plus* of a set of such complex probabilistic formulas can be decided  more or less efficiently (yeah, I know, that it's hard to speak about efficiency when dealing with huge linear systems, but still..). For example, given the semantics of "I|=A or B iff I|=A or I|=B" where both A a and B are conditional constraints, it seems that it might be necessary to non-deterministically solve two linear systems . Negation is also tricky (though can be omitted for the moment).&lt;br /&gt;So, seems to be an interesting research question. But more importantly, do we actually need such complex probabilistic ABoxes? Would someone from, say, medical domain, be interested in the ability to express something like: &lt;span style="font-style: italic;"&gt;"either John has flu with probability at least 90% or he fakes his illness with probability between 70% and 80%"&lt;/span&gt;? If yes, then we can try to think about reasonably efficient generalizations of satisfiability/entailment procedures.&lt;br /&gt;&lt;br /&gt;PS. Hum, Lukasiewicz doesn't say anything about deciding satisifiability of strict probabilistic formulas w.r.t. probabilistic default theories in &lt;a href="http://citeseer.ist.psu.edu/306295.html"&gt;"Probabilistic Default Reasoning with Conditional Constraints"&lt;/a&gt;. Maybe he did elsewhere?&lt;br /&gt;&lt;br /&gt;PPS. Does any one know an easy way to generate pictures from TeX expressions and embed them into blog posts similarly to Wikipedia pages?&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/18292881-7988394326218176061?l=klinov.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/18292881/posts/default/7988394326218176061'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/18292881/posts/default/7988394326218176061'/><link rel='alternate' type='text/html' href='http://klinov.blogspot.com/2007/11/probabilistic-description-logic-and.html' title='Probabilistic Description Logic and Complex Formulas'/><author><name>Pavel Klinov</name><uri>http://www.blogger.com/profile/03151624478275452894</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author></entry><entry><id>tag:blogger.com,1999:blog-18292881.post-8469426109545640716</id><published>2007-10-21T22:03:00.000-05:00</published><updated>2007-10-22T16:22:56.961-05:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='pronto'/><title type='text'>[Pronto] on its ability to generalize</title><content type='html'>&lt;div style="text-align: justify;"&gt;When reading a perfect paper-survey  by  Simon Parsons on  current approaches  to handling imperfect knowledge in DB and KB [1], I noticed a fairly interesting example. It says: "&lt;span style="font-style: italic;"&gt;... Amarger et al. [2] are essentially doing a series of propositions, whose relationships are stated using imprecisely stated probabilities, and inferring new probabilistic relationships between them. Thus given the information that between 70% and 90% students do sport, and between 85% and 90% of students are young, they show how to conclude what proportion of young people do sport...&lt;/span&gt;"&lt;br /&gt;&lt;/div&gt;&lt;div style="text-align: justify;"&gt;This got me curious. Can Pronto do something like that? Formally, given that (Sportsman|Student)[0.7;0.9] and (Young|Student)[0.85;0.9] can it conclude anything meaningful about (Sportsman|Young)[l,u]? Well, quick check revealed that it can't, that answer was [0;1] - &lt;span style="font-style: italic;"&gt;_cant_say_anything_ &lt;/span&gt;which is expected, of course. Note, that neither Sportsman nor Young is a sub-class of Student, so why would we want to conclude anything about the relationship between potentially broader classes? From a very cautious, rigorously probabilistic viewpoint, we cannot. What if the probability of Student is zero, for instance?&lt;br /&gt;&lt;/div&gt;&lt;div style="text-align: justify;"&gt;Another query would be something like (StudentSportsman|YoungStudent)[l,u] (i.e. same thing but Sportsman and Young converted to obvious subclasses of Student). This query results in [0.7;0.9] which isn't a surprise either. Indeed,  suppose we get a random individual who is a young student. What do we know about him? Well, he's a student. Anything else? Can we gain something from  (YoungStudent|Student)[0.85;0.9]? Nope. So, we're forced to an unpleasant conclusion that the result will be fully determined by (StudentSportsman|Student)[0.7;0.9].&lt;br /&gt;&lt;/div&gt;&lt;div style="text-align: justify;"&gt;This is somewhat discouraging - yet another evidence that probabilities don't always produce the results that humans would expect. Normally, having a sample data, e.g., students and knowing how many of them are young and/or do sports, you might get a rough estimate about youngsters in general. It's debatable whether it would be a good estimate, but you *do get* some idea. Here, you get nothing because of being too cautious.&lt;br /&gt;&lt;/div&gt;&lt;div style="text-align: justify;"&gt;Does it mean that Pronto entailment relation is too weak? Well, it may or may not be, depending on what it's used for.&lt;br /&gt;&lt;br /&gt;[1] S. Parsons, "Current Approaches to Handling Imperfect Information in Data and Knowledge Bases," &lt;span style="font-style: italic;"&gt;IEEE Transactions on Knowledge and Data Engineering&lt;/span&gt;, 8(3), pp. 353-372, 1996&lt;br /&gt;[2] A. Amarger, D. Dubois and H. Prade, "&lt;span style="font-style: italic;"&gt;Constraint Propagation with Imprecise Conditional Probabilities,&lt;/span&gt; " Proceedings of the 7th Conference on Uncertainty in Artificial Intelligence, pp. 26-34&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/18292881-8469426109545640716?l=klinov.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/18292881/posts/default/8469426109545640716'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/18292881/posts/default/8469426109545640716'/><link rel='alternate' type='text/html' href='http://klinov.blogspot.com/2007/10/pronto-on-its-ability-to-generalize.html' title='[Pronto] on its ability to generalize'/><author><name>Pavel Klinov</name><uri>http://www.blogger.com/profile/03151624478275452894</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author></entry><entry><id>tag:blogger.com,1999:blog-18292881.post-2097565231785295113</id><published>2007-10-18T10:29:00.000-05:00</published><updated>2007-10-18T10:40:14.219-05:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='pronto'/><category scheme='http://www.blogger.com/atom/ns#' term='research'/><title type='text'>Comments on HCLS/Uncertainty Use Cases</title><content type='html'>As some people might know from &lt;a href="http://clarkparsia.com/weblog/2007/09/27/introducing-pronto/"&gt;my Clark&amp;amp;Parsia weblog post&lt;/a&gt;, I've spent my summer practical training on extending &lt;a href="http://pellet.owldl.com/"&gt;Pellet&lt;/a&gt; with probabilistic capabilities. Now, we're trying to promote the tool by showing how it might help people in real-life cases. Some of such cases were summarized by &lt;a href="http://esw.w3.org/topic/HCLS/UncertaintyUseCases"&gt;W3C Semantic Web Healthcare and Life Sciences Interest Group&lt;/a&gt;. Let me comment on those:&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;# Hypothesis Uncertainty&lt;/span&gt;&lt;br /&gt;    &lt;span style="font-style: italic;"&gt;* Mutations in the alpha synuclein could cause Parkinsons Disease&lt;/span&gt;&lt;br /&gt;   &lt;br /&gt;    The problem here is how to quantify "could cause". As long as this is done, a constraint can be added using evidence class "AlphaSynucleinMutation" and conclusion class "ParkinsonDisease". Pronto supports 2nd level uncertainty (meta-uncertainty - uncertainty in probability estimation), so this should not be a huge problem.&lt;br /&gt;    &lt;br /&gt;&lt;span style="font-style: italic;"&gt;    * Hypotheses of relationships based on statistical analysis of microarray data associated with p-values, confidence intervals, etc.&lt;/span&gt;&lt;br /&gt;&lt;span style="font-style: italic;"&gt;    * Gene Ontology Evidence codes in support of a particular GO annotation of a gene&lt;/span&gt;&lt;br /&gt;&lt;span style="font-style: italic;"&gt;    * Evidence classes in the OBO Evidence Ontology&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;    It would be interesting to take a look at the ontology, but it seems that evidence classes if turned into OWL classes might serve as perfect evidences for conditional constraints. Hypotheses will be conclusions.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;# Interpretation/Classification Uncertainty&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-style: italic;"&gt;    * The patient has elevated cholesterol based on his reading of X mg/dl&lt;/span&gt;&lt;br /&gt;&lt;span style="font-style: italic;"&gt;    * Given the same set of symptoms, Doctor X and Y come up with diagnosis of mild and severe disease respectively&lt;/span&gt;&lt;br /&gt;   &lt;br /&gt;    Ok, let's say ABCD is our compound evidence class (where A,B,C,D a classes-symptoms), MD - mild disease class and SD - severe disease class. Then, I guess there should by some sort of statistics that could be expressed in a pair of constraints - (MD|ABCD)[l_1,u_1] and (SD|ABCD)[l_2, u_2]. Having an individual _a_, both doctors add a probabilistic ABox assertion (ABCD|Top)[l_a,u_a] for _a_ (or maybe even strict one - _a_:ABCD). Till now everything is perfectly supported by Pronto. Then, Pronto will compute results (MD|Top)[l_amd, u_amd] and (SD|Top)[l_asd, u_asd]for _a_. The question is, what doctors should conclude? They may either favour one diagnosis to another basing on probabilities or override the probabilities for _a_ (say, (MD|Top)[0.2, 0.4] and doctor X explicitly adds (MD|Top)[0.7, 0.8] for _a_). Having enough number of *overriden individuals* one may even adjust generic constraints (MD|ABCD) and (SD|ABCD)&lt;br /&gt;   &lt;br /&gt;&lt;span style="font-style: italic;"&gt;    * True/False Positive/Negative rates of patient classifications and diagnoses. Use of measures such as Precision, Recall, PPV, NPV, etc.&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;    Not sure I understand this&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;# Prediction-oriented Uncertainty&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-style: italic;"&gt;    * A person with the BRCA1 gene has a disposition towards Breast Cancer with 70% probability in the future &lt;/span&gt;&lt;br /&gt;    &lt;br /&gt;    See &lt;a href="http://clarkparsia.com/weblog/2007/10/02/using-pronto/"&gt;BRCA&lt;/a&gt; model&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;# Belief oriented uncertainty&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-style: italic;"&gt;    * It is believed to the best of our knowledge that a particular gene is not implicated in a particular disease&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;    This is the problem of independence that is not currently captured in Pronto by any means. Basically, it doesn't hurt the representation (besides of making it less explicit). As long as there're no constraints connecting the gene and the disease (even indirectly), no knowledge about the presense of the gene affects the reasoning results concerning the disease. The problem is performance because the reasoning engine _doesn't know_ that the gene is irrelevant to the disease and will take it into account when constraining the probability interval on the disease.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-style: italic;"&gt;    * Associated non-monotonicity with the above, i.e., if more knowledge is available, the statement could be proven false. &lt;/span&gt;&lt;br /&gt;&lt;br /&gt;    Pronto is inherently non-monotonic, i.e. it supports defeasible knowledge. Unfortunately, as long as independence can't be explicitly represented by an axiom, the axiom can't be defeated :) Instead, if the gene _is known_ to be implicated in the disease, this statement might be overriden (defeated) for some particular individual (for whome we know something else, say, the presense of another gene that diminishes the harm of the former).&lt;br /&gt;   &lt;br /&gt;    I must say that Pronto currently doesn't support any belief updating mechanism, e.g. accumulated knowledge about certain individuals doesn't by any means affect generic default knowledge. I can imagine this to be desirable. This might be implemented on top of the core representation and reasoning servises. This also correlates with learning conditional constraints from data.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;# Data Source based Uncertainty&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-style: italic;"&gt;    * Samples from the same patient are analyzed by different labs. Lab 1 results show an 80% probability of Disease 1, whereas Lab2 shows a 90% probability for the same.&lt;/span&gt;&lt;br /&gt;   &lt;br /&gt;    That's the perfect example of uncertainty intervals: constraint (Disease1|Evidence)[0.8;0.9] would naturally model the case.&lt;br /&gt;   &lt;br /&gt;&lt;span style="font-style: italic;"&gt;    * If the Cleveland Clinic says that Avandia is bad for Diabetes, the statement has a higher value of certainty as opposed to an individual Dr. X &lt;/span&gt;&lt;br /&gt;    &lt;br /&gt;    Pronto operates with intervals (see the prev. bullet) but it doesn't assume any bias in distribution of actual probabilities w/in the intervals. In other words, if the probability that A is subsumed by B is w/in [l,u], it means that it can be anywhere between l,u and the representational language doesn't allow to specify whether it's "more closer" to _l_ or to _u_. Equipping Pronto with some predefined set of distributions, e.g., normal, etc., might be an interesting problem. Obviously, it won't affect satisfiability and consistency, but we might be able to compute smth like expectations for inferences.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;# Data Uncertainty&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-style: italic;"&gt;    * Approximate location of a clinical feature, e.g, tumor in spatial location in the human body as captured in radiological image or any other digital artifact&lt;/span&gt;&lt;br /&gt;&lt;span style="font-style: italic;"&gt;    * Data inconsistency and incompletenes encountered in Healthcare and Drug Database&lt;/span&gt;s&lt;br /&gt;   &lt;br /&gt;    This is again related to learning certainty intervals from data which can be incomplete, say, in a relational table some attribute values are either unknown (missing) or unreliable. One way of computing probabilities from such data is to use the approach similar to rule induction from incomplete information tables (see &lt;a href="http://lightning.eecs.ku.edu/c95-perugia.pdf"&gt;http://lightning.eecs.ku.edu/c95-perugia.pdf&lt;/a&gt;) Basically, if we could come up with approximations of classes based on the data we have, we can approximate the subsumption relationship between them and represent lower and upper approximations as probabilities. After that Pronto should be able to reason.&lt;br /&gt;   &lt;br /&gt;&lt;span style="font-style: italic;"&gt;    * Data uncertainty introduced due to sampling errors, sampling rates, etc.)&lt;/span&gt;&lt;br /&gt;&lt;span style="font-style: italic;"&gt;    * Data uncertainty introduced due to the limitations (least count error?) of the device measuring patient characteristics (e.g., temperature)&lt;/span&gt;&lt;br /&gt;&lt;span style="font-style: italic;"&gt;    * Data uncertainty introduced due to limitation of instruments used to collect experimental data, e.g., micro-arrays&lt;/span&gt;&lt;br /&gt;   &lt;br /&gt;    All these are related to handling incompleteness. In general, this situation is often referred to as _granularity_ of knowledge. It's barely feasible to capture the knowledge on the finest granularity level due to measurement limitations - that's one of the reasons we have intervals for probabilities, not just probabilities. These questions aren't directly related to Pronto but are rather more generic issues of quantifying and representing uncertainty as well as computing approximate concepts from uncertain data (see, e.g., &lt;a href="http://www2.cs.uregina.ca/%7Eyyao/concept_lattice/fca_app.pdf"&gt;http://www2.cs.uregina.ca/~yyao/concept_lattice/fca_app.pdf&lt;/a&gt;)&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/18292881-2097565231785295113?l=klinov.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/18292881/posts/default/2097565231785295113'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/18292881/posts/default/2097565231785295113'/><link rel='alternate' type='text/html' href='http://klinov.blogspot.com/2007/10/comments-on-hclsuncertainty-use-cases.html' title='Comments on HCLS/Uncertainty Use Cases'/><author><name>Pavel Klinov</name><uri>http://www.blogger.com/profile/03151624478275452894</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author></entry><entry><id>tag:blogger.com,1999:blog-18292881.post-4649041044701517535</id><published>2007-10-18T10:27:00.000-05:00</published><updated>2007-10-18T10:29:18.267-05:00</updated><title type='text'>Me back again</title><content type='html'>Yeah, it's been another 8-9 months since my last post. I must admit that I'm not very organized and can't write here on a regular basis (as I should because posting certainly helps in organizing and structuring ideas). Oh well.. Another try?&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/18292881-4649041044701517535?l=klinov.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/18292881/posts/default/4649041044701517535'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/18292881/posts/default/4649041044701517535'/><link rel='alternate' type='text/html' href='http://klinov.blogspot.com/2007/10/me-back-again.html' title='Me back again'/><author><name>Pavel Klinov</name><uri>http://www.blogger.com/profile/03151624478275452894</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author></entry><entry><id>tag:blogger.com,1999:blog-18292881.post-1381909040582687729</id><published>2007-01-21T21:31:00.000-05:00</published><updated>2007-01-21T21:36:58.691-05:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='research'/><title type='text'>Mereology: Set and Class</title><content type='html'>...just wondering if anyone is proficient in Lesniewski mereology and understands the motivation behind the "set-of" property? Let me remind you:&lt;br /&gt;&lt;em&gt;Object x is said to be a set of objects with property m iff:&lt;/em&gt;&lt;br /&gt;&lt;em&gt;for any y if y-object-m and y-ingr-x then there exist objects z and t such that:&lt;/em&gt;&lt;br /&gt;&lt;ol&gt;&lt;li&gt;&lt;em&gt;z-ingr-y&lt;/em&gt;&lt;/li&gt;&lt;li&gt;&lt;em&gt;z-ingr-t&lt;/em&gt;&lt;/li&gt;&lt;li&gt;&lt;em&gt;t-ingr-x&lt;/em&gt;&lt;/li&gt;&lt;li&gt;&lt;em&gt;t-object-m&lt;/em&gt;&lt;/li&gt;&lt;/ol&gt;&lt;p&gt;So it can be read smth like:&lt;/p&gt;&lt;p&gt;&lt;em&gt;x is a set of objects with property m iff for any ingredient of x having the property m, it has to have an ingredient that is an ingredient of that part of x that has property m.&lt;/em&gt;&lt;/p&gt;&lt;p&gt;Yeah, don't tell me it' horrible. I know. So, what was Lesniewski intuition when he came up with it?&lt;/p&gt;&lt;em&gt;&lt;/em&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/18292881-1381909040582687729?l=klinov.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/18292881/posts/default/1381909040582687729'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/18292881/posts/default/1381909040582687729'/><link rel='alternate' type='text/html' href='http://klinov.blogspot.com/2007/01/mereology-set-and-class.html' title='Mereology: Set and Class'/><author><name>Pavel Klinov</name><uri>http://www.blogger.com/profile/03151624478275452894</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author></entry><entry><id>tag:blogger.com,1999:blog-18292881.post-3200693377440529895</id><published>2007-01-21T11:46:00.000-05:00</published><updated>2007-01-21T21:46:10.334-05:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Pavel on Web'/><title type='text'>Homepage</title><content type='html'>It finally happened! My new homepage (&lt;a href="http://homepages.uc.edu/~klinovp"&gt;http://homepages.uc.edu/~klinovp&lt;/a&gt;) is now accessible. Yes, there're lots of "under construction" pages, but don't worry, one day they'll in place too.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/18292881-3200693377440529895?l=klinov.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/18292881/posts/default/3200693377440529895'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/18292881/posts/default/3200693377440529895'/><link rel='alternate' type='text/html' href='http://klinov.blogspot.com/2007/01/homepage.html' title='Homepage'/><author><name>Pavel Klinov</name><uri>http://www.blogger.com/profile/03151624478275452894</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author></entry><entry><id>tag:blogger.com,1999:blog-18292881.post-3394013133129837250</id><published>2007-01-15T18:21:00.000-05:00</published><updated>2007-01-15T18:26:42.197-05:00</updated><title type='text'>Welcome back, Pavel!</title><content type='html'>well, at some point I started thinking that I'd never be back... No because of being incredibly busy, it's just laziness that was holding me back. And nonetheless I'm back! &lt;br /&gt;Smth has happened during my absence here. The biggest thing is that I've passed PhD Qualifier and now can dive deeply in my research. Yes-yes, I outscored all other PhD students in Computer Science and Engineering! Are they too bad or am I exceptionally good? Neither is true, I suppose... It just happened, it's in the past now, it's a part of the history.&lt;br /&gt;Hopefully, I'll be more regular here now. We'll see.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/18292881-3394013133129837250?l=klinov.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/18292881/posts/default/3394013133129837250'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/18292881/posts/default/3394013133129837250'/><link rel='alternate' type='text/html' href='http://klinov.blogspot.com/2007/01/welcome-back-pavel.html' title='Welcome back, Pavel!'/><author><name>Pavel Klinov</name><uri>http://www.blogger.com/profile/03151624478275452894</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author></entry><entry><id>tag:blogger.com,1999:blog-18292881.post-113321959977728273</id><published>2005-11-28T17:52:00.000-05:00</published><updated>2005-11-28T18:13:19.840-05:00</updated><title type='text'>Research Proposal for KR class</title><content type='html'>This seems to be a sort of fundamental problem - Larry asks his students to come up with some kind of a research proposal (RP) similar to those submitted to NSF (National Science Foundation) in order to get financial support. The reason he's doing this is to get us familiar with asking for money :) Everybody needs money. I need money to study but unfortunately it's not going to fall into my pockets. NSF is one of potetntial financial cows to dry off, so it'd be stupid to not explore this opportunity in the future.&lt;br /&gt;Surely we're not going to submit what will be written in next few days - it'll be rubbish. But! It can give us good ideas of what it takes and what might help you to succeed down the road.&lt;br /&gt;My personal problem is that I do not know what my area would be. Larry has suggested some research in the area of managing impreciseness when building (learning) ontologies for Semantic Web. This sounds pretty scientific but I have a little understanding of the current "state-of-art" - what research is underway, which directions are promising which are not, which attempts have been futile, who made any breakthroughs on what and when, etc, etc, etc... Knowing this stuff requires lots of readings - technical papers, journals, books, conferences proceedings... I simple haven't had a chance to go through that (hum, I could use my XMas break to improve on that) and this's what stops me from progressing.&lt;br /&gt;However it's not a good time to panic (in fact, i don't know any time suitable to fall into a depression). I used Thanskgiving break (4 days) to surf web sources looking for any writings concerned with the area. I did find quite a few of them (43Mb of tough technical stuff). The main idea I got out of that is the possibility to employ Fuzzy Sets (and Logic) techniques for representing inherently vague information. Many people believe this is the most natural way of dealing with categories commonly used by humans that are not clear, formal creatures (unfortunately!). Therefore if we're going to mine textual sources (like most of Web's content) in order to build conceptualizations (ontologies) they are also going to be imprecise. Not only because we're uncertain in their borders (although we are) but also because the borders can't be precise even if we have full information about the Universe. That's why we're looking at Fuzzy stuff not at any means of dealing with uncertainty (Bayesian networks, Dempster-Shafer, etc).&lt;br /&gt;So at least I know now what I'm going to read and where (very roughly but still) I'm going to apply my modest brain.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/18292881-113321959977728273?l=klinov.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://klinov.blogspot.com/feeds/113321959977728273/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=18292881&amp;postID=113321959977728273' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/18292881/posts/default/113321959977728273'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/18292881/posts/default/113321959977728273'/><link rel='alternate' type='text/html' href='http://klinov.blogspot.com/2005/11/research-proposal-for-kr-class.html' title='Research Proposal for KR class'/><author><name>Pavel Klinov</name><uri>http://www.blogger.com/profile/03151624478275452894</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-18292881.post-113321830458321277</id><published>2005-11-28T17:46:00.000-05:00</published><updated>2005-11-28T17:51:44.636-05:00</updated><title type='text'>Final DB exam</title><content type='html'>Very last and comprehensive DB exam is going to be held on Thursday.. I really need above 90% on tha given my awful perofrmance of first two attemps. But being pinned down doesn't give me any confidence, I don't see how I'm better prepared than I was 4-5 weeks ago. Still having troubles with fairly vague and imprecise questions that allow multiple answers and God knows which one is going to be considered correct (or Larry might know as well). Nonetheles let's stop crying and do what can be done here...&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/18292881-113321830458321277?l=klinov.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/18292881/posts/default/113321830458321277'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/18292881/posts/default/113321830458321277'/><link rel='alternate' type='text/html' href='http://klinov.blogspot.com/2005/11/final-db-exam.html' title='Final DB exam'/><author><name>Pavel Klinov</name><uri>http://www.blogger.com/profile/03151624478275452894</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author></entry><entry><id>tag:blogger.com,1999:blog-18292881.post-113219123744541175</id><published>2005-11-16T20:31:00.000-05:00</published><updated>2005-11-16T20:33:57.456-05:00</updated><title type='text'>"Automata.." looks good now!</title><content type='html'>Suprisingly enough Dr. Schmidt indeed gave me 95% on the recent exam. That's really unexpected given I failed on the last problem but he took off only 5 points.. Anyhow - it's really nice because now I'm clearly on the right track to A (also got 19 out of 20 on the last hwrk). Way to go!&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/18292881-113219123744541175?l=klinov.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://klinov.blogspot.com/feeds/113219123744541175/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=18292881&amp;postID=113219123744541175' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/18292881/posts/default/113219123744541175'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/18292881/posts/default/113219123744541175'/><link rel='alternate' type='text/html' href='http://klinov.blogspot.com/2005/11/automata-looks-good-now.html' title='&quot;Automata..&quot; looks good now!'/><author><name>Pavel Klinov</name><uri>http://www.blogger.com/profile/03151624478275452894</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-18292881.post-113167791797776778</id><published>2005-11-10T21:41:00.000-05:00</published><updated>2005-11-10T22:32:38.646-05:00</updated><title type='text'>Bad week</title><content type='html'>Well... it was not a great week in terms of grades received. "Shit happens" (c)...&lt;br /&gt;1. Databases. This is the most disappointing course. 73 points was my grade on the second exam which is just miserable.. awful. The funny thing is that I got 15 (!!!) points taken off for not underlying primary keys on normalization problems. Those were hardest and only a few people solved them.. i did. But no keys means zero points although the decompositions were impeccable. Apart of that I made 2 real mistakes and 2 questions I'll probably ask to regrade although it's not gonna change a lot. I'm in trouble in this class, 3rd exam will be crucial for me.&lt;br /&gt;2. Automata. Midterm was held on Tuesday and was probably graded by Prof. Schmidt (not by the TA). Results were posted on Web just couple of hours ago and my score is unbelievable 95/100 (average is 75 including undergraduates who get full credit for 1 skipped problem). I suspect it's a mistake. I couldn't earn 95/100, I failed on 1 problem and 2 other solutions were fairly dubious. So, no joy so far...&lt;br /&gt;3. AI. We had 2 quizes (small tests) and some homeworks. I made mistakes on both quizes (quite stupid) but got 9/10 for first one and 100% result on homeworks. I think Dr.Bhatnagar is too loyal when grading...&lt;br /&gt;4. Knowledge Representation. We (I and Julia) were giving a talk today on "Structured Descriptions" which is a branch of logic used to automate the process of organizing logical concepts into hierarchical taxonomies. The talk went quite well (apart of some purely technical issues), I got not a bad reference from Larry (93/100, understanding - excellent, presenting - good). So KR is fine so far but later on we'll have another talks (on ontologies), an oral exam and some sort of Research Proposal. Which must be good for me given my terrible performance in DB class...&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/18292881-113167791797776778?l=klinov.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://klinov.blogspot.com/feeds/113167791797776778/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=18292881&amp;postID=113167791797776778' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/18292881/posts/default/113167791797776778'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/18292881/posts/default/113167791797776778'/><link rel='alternate' type='text/html' href='http://klinov.blogspot.com/2005/11/bad-week.html' title='Bad week'/><author><name>Pavel Klinov</name><uri>http://www.blogger.com/profile/03151624478275452894</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-18292881.post-113112616093075652</id><published>2005-11-04T12:38:00.000-05:00</published><updated>2005-11-04T12:42:40.933-05:00</updated><title type='text'>Automata's exam coming</title><content type='html'>Now it came to the real exam. Big test, short time, long calculations and no questions to skip (this is only possible for undergrads). I'm especially concerned with long transformation of an NFA to the minimal DFA - the kind of problems where extremely easy to make a mistake. Another popular sort of problems is Pumping Lemma applications, problems which require a lot of experience to quickly take right strategy but which don't usually involve tedious calculations.&lt;br /&gt;    To sum up, I'm going to sacrifice my WE in behalf of this crap... Which is useful anyway as long as Automata's questions will be in DQE.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/18292881-113112616093075652?l=klinov.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://klinov.blogspot.com/feeds/113112616093075652/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=18292881&amp;postID=113112616093075652' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/18292881/posts/default/113112616093075652'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/18292881/posts/default/113112616093075652'/><link rel='alternate' type='text/html' href='http://klinov.blogspot.com/2005/11/automatas-exam-coming.html' title='Automata&apos;s exam coming'/><author><name>Pavel Klinov</name><uri>http://www.blogger.com/profile/03151624478275452894</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-18292881.post-113112586709852444</id><published>2005-11-04T12:30:00.000-05:00</published><updated>2005-11-05T21:54:07.080-05:00</updated><title type='text'>DB exam has gone. Bad expectations</title><content type='html'>Well, I do not expect anything good from it. I've skipped 4 questions (as required) and all of them were multiple choises questions. The fact is that for them it's possible to receive partial credit whereas other (like more hard normalization problems) are graded on right/wrong basis. Which means that I ran into a lot of risk not skipping them... But I still hope that Larry's gonna give for them more credit, just because they're hard. If that's not the case I'm in trouble because I spent a lot of time on them and not very confident in solutions.&lt;br /&gt;So, will see next week, but anything above 80% would be a real fortune...&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/18292881-113112586709852444?l=klinov.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://klinov.blogspot.com/feeds/113112586709852444/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=18292881&amp;postID=113112586709852444' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/18292881/posts/default/113112586709852444'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/18292881/posts/default/113112586709852444'/><link rel='alternate' type='text/html' href='http://klinov.blogspot.com/2005/11/db-exam-has-gone-bad-expectations.html' title='DB exam has gone. Bad expectations'/><author><name>Pavel Klinov</name><uri>http://www.blogger.com/profile/03151624478275452894</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-18292881.post-113090739419972784</id><published>2005-11-01T23:47:00.000-05:00</published><updated>2005-11-01T23:56:34.206-05:00</updated><title type='text'>Automata's homework</title><content type='html'>38 out of 44.. brr.. everything under 90% is terrible. Especially when getting points taken off for nothing.&lt;br /&gt;    However, I must be wrong because Dr. Schmidt is an extremely precise and accurate person (he's German, btw). But I really don't see where I'm mistaken in the following exercise: "prove that the language L={www(w)r; w belongs to {a,b}*} isn't regular." w(r) is an inverted w.&lt;br /&gt;    The way it's expected to be proved is applying the Pumping Lemma. However it might be tedious to do so directly, that's why I've taken another way. Let's define some terms first: tail(L) is a language such that: {w: xw belongs to L for some x - any sequence of terminals}. It has been proved that if L is regular, tail(L) is regular too (by construction).&lt;br /&gt;    It has also been proved (by Schmidt) that w(w)r is not regular (using Pumping Lemma). But w(w)r is a language from tail(L)! Which would be regular provided L is regular, right? Then L must be irregular... This is my way of proving this.&lt;br /&gt;    However, TA has taken 2 points off arguing that Pumping Lemma is expected. Any ideas where I'm mistaken?&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/18292881-113090739419972784?l=klinov.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://klinov.blogspot.com/feeds/113090739419972784/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=18292881&amp;postID=113090739419972784' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/18292881/posts/default/113090739419972784'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/18292881/posts/default/113090739419972784'/><link rel='alternate' type='text/html' href='http://klinov.blogspot.com/2005/11/automatas-homework.html' title='Automata&apos;s homework'/><author><name>Pavel Klinov</name><uri>http://www.blogger.com/profile/03151624478275452894</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-18292881.post-113073639807499719</id><published>2005-10-31T00:10:00.000-05:00</published><updated>2005-10-31T00:26:38.083-05:00</updated><title type='text'>After DB review session</title><content type='html'>Ok, we had the review session today and I addressed my questions about overlapping keys to Larry. As I predicted, he didn't give me an answer, rather claimed that any relation must be in BCNF as long as there're no overlapping keys. On my example with CK1 (A,B), CK2 (C,D) and C-&gt;B he said that a partial dependency C-&gt;B would break even 2NF! This is not true (imho) because all authors mention that 2NF and 3NF do NOT deal with prime attributes (those that are parts of a key). But B is a part of a key... But for Larry it's easier to not see that, then he claims that as long as 2NF is violated, BCNF can never be satisfied. Unfortunately if this were true his BCNF examples would not work either, but I wasalready  too tired to argue that.. no way.&lt;br /&gt;    But the good thing is that I borrowed an Ulman book from Larry and this guy seem to deal with  restricted 2(3)NF in a most graceful way - he defines only 3NF and &lt;span style="font-style: italic;"&gt;after &lt;/span&gt;BCNF!! But then he nevertheless states that if an attribute is prime a relation may still be in 3NF even though if this attribute depend on non-key. He-he, I'll send this to Larry.. maybe..&lt;br /&gt;    And finally, Jeffery Ulman &lt;span style="font-weight: bold;"&gt;never ever&lt;/span&gt; says something about overlapping candidate keys! He just goes around this.. which implies that this's not mandatory for violating BCNF (again - imho). Interesting!&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/18292881-113073639807499719?l=klinov.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/18292881/posts/default/113073639807499719'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/18292881/posts/default/113073639807499719'/><link rel='alternate' type='text/html' href='http://klinov.blogspot.com/2005/10/after-db-review-session.html' title='After DB review session'/><author><name>Pavel Klinov</name><uri>http://www.blogger.com/profile/03151624478275452894</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author></entry><entry><id>tag:blogger.com,1999:blog-18292881.post-113060634054468372</id><published>2005-10-29T12:14:00.000-05:00</published><updated>2005-10-29T12:19:00.556-05:00</updated><title type='text'>Further confusion with BCNF</title><content type='html'>BCNF &lt;=&gt; any left-irreducible, non-trivial dependency has a candidate key as its determinant. Right? Less formally it means that any determinant is a CK. Ok.&lt;br /&gt;But who can explain me why CK must overlap to violate BCNF?! How does it entail from the formal definition?&lt;br /&gt;R (...A,B...C,D....). CK1(A.B). CK2(C,D). But! C-&gt;B. Does that mean that R is not in 3NF due to transitive dependency (A,B)-&gt;C and C-&gt;(B)? If yes - I don't understand why BCNF examples do not violate 2NF, for instance... If no - we don't need overlapping to violate BCNF. Heeey! Help me, Codd....&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/18292881-113060634054468372?l=klinov.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://klinov.blogspot.com/feeds/113060634054468372/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=18292881&amp;postID=113060634054468372' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/18292881/posts/default/113060634054468372'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/18292881/posts/default/113060634054468372'/><link rel='alternate' type='text/html' href='http://klinov.blogspot.com/2005/10/further-confusion-with-bcnf.html' title='Further confusion with BCNF'/><author><name>Pavel Klinov</name><uri>http://www.blogger.com/profile/03151624478275452894</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-18292881.post-113055762644930326</id><published>2005-10-28T22:46:00.000-05:00</published><updated>2005-10-28T22:47:06.450-05:00</updated><title type='text'>Minor correction</title><content type='html'>I'm correcting a minor mistake here. The relation I mentioned in previous email was in BCNF. But the problem persists if we define FD2: dealer -&gt; car. Then the relation is not in BCNF (dealer is not a CK) and decomposing will lead to problems with dependency preservation...&lt;br /&gt;     In fact, the case that I mistakenly considered first, is still interesting... It's in BCNF (because both determinants are CK) and it eliminates update anomalies. But what about delete anomalies? Date seem to be silent here... But if we remove the record for some customer, we may loose the information that certain dealer sells certain model, correct? So we still have problem.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/18292881-113055762644930326?l=klinov.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://klinov.blogspot.com/feeds/113055762644930326/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=18292881&amp;postID=113055762644930326' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/18292881/posts/default/113055762644930326'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/18292881/posts/default/113055762644930326'/><link rel='alternate' type='text/html' href='http://klinov.blogspot.com/2005/10/minor-correction.html' title='Minor correction'/><author><name>Pavel Klinov</name><uri>http://www.blogger.com/profile/03151624478275452894</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-18292881.post-113054885148174750</id><published>2005-10-28T20:15:00.000-05:00</published><updated>2005-10-28T20:20:51.493-05:00</updated><title type='text'>BCNF: A letter to Larry</title><content type='html'>I'm preparing for DB exam, playing with Boyce-Codd Normal Form.. I'm totally confused. This is my email to Larry. He's forgotten to point out this! Of course he has...&lt;br /&gt;&lt;br /&gt;"&lt;br /&gt;Dr. Mazlack--  &lt;br /&gt;&lt;p class="MsoNormal" style="text-align: justify;"&gt;&lt;br /&gt;It seems that I'm going crazy... I just spent 3-4 hours today playing with BCNF and I think something must be terribly wrong with that. And you've probably forgotten to tell us about this... (I hope you didn't do it intentionally, otherwise it's too cruel...)&lt;br /&gt;Consider your example with relation R: (customer, car, dealer). FD1: (customer, car)-&gt;dealer; FD2: (customer, dealer)-&gt; car. It's clearly in 3NF, not in BCNF, right?&lt;br /&gt;What can we infer from this definition? Well, it's pretty obvious - &lt;b&gt;one model of a car cannot be sold to one person by two distinct dealers. &lt;/b&gt;That's why FD1 holds. The same can be said about FD2.&lt;br /&gt;Ok, everything was simple until this point. Now we decompose: R1(customer, car), R2(car, dealer). There're no simple determinants in both relations but they're both in BCNF, right? Is the decomposition loss-less? Sure it is.&lt;br /&gt;I'm 100% confident that by this point you have realized what my concern is. Does FD1 still hold? Nope.&lt;br /&gt;I was stupid enough spending hours looking into this today (no Date nearby...). Then I gave up and started googling and found a couple of interesting papers from experts. It appeared that the situation was well described before (of course!!) and named &lt;b&gt;dependency preservation.&lt;/b&gt; Which means that we’d like to be able to reconstruct the original set of functional dependencies after re-joining relations (those that emerged after the decomposition)? Other people might instead say that those relations must be &lt;b&gt;independent if Rissanen's sense. &lt;/b&gt;Which merely means there should not be any implicit dependencies (like FD1 which now wires R1 and R2).&lt;br /&gt;    The obvious question is: where's a loss-less decomposition that would preserve dependencies? It doesn't exist. It is just as simple as that. No justification for 3 hours passed way...&lt;br /&gt;    Less obvious question is: how can we find if any nice (you understand what I mean by nice..) decomposition exists? After another search I found that it was proved by Tsou and Fischer (paper attached) that this is an NP-hard problem. Which is probably not as bad as it sounds (nobody would decompose a table with hundreds of attributes!) but rather reveals some ugliness of BCNF.. . Which gives rise to my final question (only for today!):&lt;br /&gt;    Is it worth to decompose relations into BCNF? Yes, I know that you can draw plenty of examples with some anomalies eliminated and so on... But what about less simple cases (it's my turn to say that we are talking about pure theory here, that's why I'm concerned with relations with many attributes, many FD that are going to be broken after loss-less decomposition)?&lt;br /&gt;    It's interesting to see what experts say about that (oh, I really should check with my favourite Date first, but I'm impatient that's why I'm writing you before). Bernstein (a man from Harvard), for example, claims &lt;b&gt;BCNF does not meet its goals except of trivial cases. &lt;/b&gt;That's a strong claim, isn't it? It's a nice, extremely formal and rigorous paper, worth to read (not easy though).&lt;br /&gt;    I don't have a conclusion here... Something seems to be wrong (or at least inadequate) with normalization theory.. Or I'm blind and therefore really deserve my 70/100 (which is F) for that class..&lt;br /&gt;&lt;br /&gt;   Thank you if you read until this point. Any comments? I would avoid raising this matter on the review session.. for obvious reasons.&lt;br /&gt;&lt;br /&gt;--pavel&lt;br /&gt;"&lt;br /&gt;&lt;/p&gt; &lt;p class="MsoNormal" style="text-align: justify;"&gt;Here're references to papers:&lt;br /&gt;1. P. Bernstein, N. Goodman. "What does Boyce-Codd normal form do?"&lt;br /&gt;2. D.M. TSou, M. Fischer. "Decomposition of a relation scheme into Boyce-Codd normal form"&lt;br /&gt;&lt;/p&gt; &lt;p class="MsoNormal" style="text-align: justify;"&gt;A curious reader may check them out... Or I can send pdfs. It's a worth reading!! :)&lt;br /&gt;&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/18292881-113054885148174750?l=klinov.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://klinov.blogspot.com/feeds/113054885148174750/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=18292881&amp;postID=113054885148174750' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/18292881/posts/default/113054885148174750'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/18292881/posts/default/113054885148174750'/><link rel='alternate' type='text/html' href='http://klinov.blogspot.com/2005/10/bcnf-letter-to-larry.html' title='BCNF: A letter to Larry'/><author><name>Pavel Klinov</name><uri>http://www.blogger.com/profile/03151624478275452894</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-18292881.post-113051271309593942</id><published>2005-10-28T10:10:00.000-05:00</published><updated>2005-10-28T10:18:33.096-05:00</updated><title type='text'>DB second exam approaching...</title><content type='html'>Larry announced yesterday that next Thursday (Nov 4) we'll have second exam (the content of first one + relational algebra, calculus, constraints &amp; normalization). That sounds better for me than first exam but the bad thing is that all exams are inclusive.&lt;br /&gt;    Regarding normalization &amp; constraints I seem to disagree with Larry on certain points again.. First, on his statement about foreign keys, he said they should match PK whereas I'm pretty sure that this condition could be relieved (they could match CK too, see Date, Horowitz, etc..). Second, I disagree on the point that FK could be wholly null and still be consistent - that's an awful extension, see Date for the complete list of possible problems arising here (update/delete ambiguity, others). Other things are not so important (like his error prone loss-less decomposition).&lt;br /&gt;    Nonetheless I realize that the exam is crucially important (given my 70 on first one) and I need to study a lot to get an acceptable result. At least because Larry expects a lot from me and that would be a great shame to fail on DB after having all those discussions with him... Hey, I'm not a bad DB developer after all!&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/18292881-113051271309593942?l=klinov.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://klinov.blogspot.com/feeds/113051271309593942/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=18292881&amp;postID=113051271309593942' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/18292881/posts/default/113051271309593942'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/18292881/posts/default/113051271309593942'/><link rel='alternate' type='text/html' href='http://klinov.blogspot.com/2005/10/db-second-exam-approaching.html' title='DB second exam approaching...'/><author><name>Pavel Klinov</name><uri>http://www.blogger.com/profile/03151624478275452894</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-18292881.post-113051218782155269</id><published>2005-10-28T10:05:00.000-05:00</published><updated>2005-10-28T10:09:47.830-05:00</updated><title type='text'>AI midterm went fine!</title><content type='html'>Sometimes nice things happen too - Raj (Dr. Raj Bhatnagar - AI's instructor) has given out midterm results and I got 96 out of 100 :) To be honest, it was quite simple test and i think quite a few people earned good grades (over 90). But by no means it makes me less happy!  Now I'm in good position towards this class and I think "A" is feasible here... The key is to not fail on final..&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/18292881-113051218782155269?l=klinov.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://klinov.blogspot.com/feeds/113051218782155269/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=18292881&amp;postID=113051218782155269' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/18292881/posts/default/113051218782155269'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/18292881/posts/default/113051218782155269'/><link rel='alternate' type='text/html' href='http://klinov.blogspot.com/2005/10/ai-midterm-went-fine.html' title='AI midterm went fine!'/><author><name>Pavel Klinov</name><uri>http://www.blogger.com/profile/03151624478275452894</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-18292881.post-113029500115999037</id><published>2005-10-25T21:30:00.000-05:00</published><updated>2005-10-25T22:35:43.210-05:00</updated><title type='text'>Starting point</title><content type='html'>So, eventually I need smth to start from. That's pretty easy, I'll start from my current situation (what things look like) and we'll see how they will go...&lt;br /&gt;So, I'm first year PhD student, my research interests lie in the area of data mining, pattern recognition, unsupervised taxonomies learning and some areas of AI. To sum up I'm interested in areas where DB and AI intersect. My research advisor is &lt;a href="http://www.ececs.uc.edu/~mazlack"&gt;Dr. Lawrence Mazlack&lt;/a&gt; (referred as Larry so forth).&lt;br /&gt;I'm taking 4 classes this semester (Fall 2005) :&lt;br /&gt;1. AI (Artificial Intelligence). This is AI basic course, we've done "Search techniques" already and had a midterm on that. The results will be known soon. Apart of that things look not so bad, homeworks are fine, quizes not that good but all above the average level.&lt;br /&gt;2. DB (Database Theory). Brr... That is a difficult matter. I do not like how Larry teaches this class. Many things don't get along with my previous experience (like indexes), some things are not covered at all (like data structures, B*-trees, other index matters like selectivity, bookmarks, look-ups, etc), some other seem to be invented by Larry and don't have any applications in the real life (like bubble charts - I hate them!). Of course that doesn't excuse my awful 70 out of 100 on first exam but rather explains why I'm not happy about the material...&lt;br /&gt;3. Automata (Theory of Automata and Formal Languages). This is not an easy course, Dr. Schmidt is a serious guy, he is strict and precise and the main reason why I'm taking A&amp;amp;FL from him is that he's writing questions for the DQE (I hope I'll have time later on to explain what DQE is - in short, it's my main challenge for the time being). I'm doing HWs not badly for this class, but the major thing is midterm expected on Nov 8.&lt;br /&gt;4. KR (Knowledge Representation). Second Larry's class. It's a strange class, it's so much informal that you're tempted to get relaxed but I suspect that would be a huge blunder... According to Larry we need to go through a collection of things to get a grade for the class. First we'll do 2 presentations (one based on the textbook, my topic is smth about Machine Learning - together with Julia (Julia is a classmate of mine), second based on some ontology paper), then we'll have an Oral exam, and finally - a sort of a Grant proposal. I suppose it's not gonna be a real, sound proposal but rather smth to delineate our interests in the area of knowledge representations, semantic web, reasoning... I have an impression it's better to be smth good..&lt;br /&gt;So, that's what I'm beginning my blog. Later, I'll check in big news regarding the courses above as well as any other academic events which are of some significant importance...&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/18292881-113029500115999037?l=klinov.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://klinov.blogspot.com/feeds/113029500115999037/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=18292881&amp;postID=113029500115999037' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/18292881/posts/default/113029500115999037'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/18292881/posts/default/113029500115999037'/><link rel='alternate' type='text/html' href='http://klinov.blogspot.com/2005/10/starting-point.html' title='Starting point'/><author><name>Pavel Klinov</name><uri>http://www.blogger.com/profile/03151624478275452894</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-18292881.post-113029380324393072</id><published>2005-10-25T21:25:00.000-05:00</published><updated>2005-10-29T21:34:11.713-05:00</updated><title type='text'>Some ideas about the content..</title><content type='html'>Well, I think the blog will be mostly about my academic activities here at &lt;a href="http://www.uc.edu/"&gt;UC&lt;/a&gt;.. The important thing is that the blog is aimed at helping me to keep track of my academic goals and intentions and is not supposed to be a place for anything else. Yes, I know, that might seem boring but that's how it's gonna be. I think it might be interesting to skim through it after a while to see how things have evolved.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/18292881-113029380324393072?l=klinov.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://klinov.blogspot.com/feeds/113029380324393072/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=18292881&amp;postID=113029380324393072' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/18292881/posts/default/113029380324393072'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/18292881/posts/default/113029380324393072'/><link rel='alternate' type='text/html' href='http://klinov.blogspot.com/2005/10/some-ideas-about-content.html' title='Some ideas about the content..'/><author><name>Pavel Klinov</name><uri>http://www.blogger.com/profile/03151624478275452894</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-18292881.post-113028333043262329</id><published>2005-10-25T18:33:00.000-05:00</published><updated>2005-10-25T18:35:30.440-05:00</updated><title type='text'>Very first post</title><content type='html'>Hey all&lt;br /&gt;That's supposed to be my very first post here.. so I'm not even sure what I need this blog for. But let's hope smth useful bubbles up sooner or later :)&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/18292881-113028333043262329?l=klinov.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://klinov.blogspot.com/feeds/113028333043262329/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=18292881&amp;postID=113028333043262329' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/18292881/posts/default/113028333043262329'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/18292881/posts/default/113028333043262329'/><link rel='alternate' type='text/html' href='http://klinov.blogspot.com/2005/10/very-first-post.html' title='Very first post'/><author><name>Pavel Klinov</name><uri>http://www.blogger.com/profile/03151624478275452894</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry></feed>
