The Analytical Kaizen of D. Alan Ridgeway
Blogging about tools, skills and insight related to Operations Research, Economics, History, Risk, IT Security or any topic with analyzable data for the purpose of my own kaizen.
Monday, February 20, 2012
Georg Cantor and the foundations of Set Theory.
Since I am on the topic of Venn Diagrams, I decided to review the history of "set theory". Given that a Venn Diagram is just a graphical representation of sets of data, producing an accurate Venn Diagram requires understanding the underlying data. While the sets of data presented in Venn Diagrams are normally finite, the whole theory of sets and the interest to mathematicians to develop the theory derived from debates over the possibility of infinity. The German mathematician Georg Cantor created the concept of set theory. Prior to Cantor, mathematicians and philosophers recognized sets of data but only considered they were possible in limited (finite) amounts. Cantor's work proved it is possible to have infinite amounts of data within a set. He also proved that different sets of infinity are not equinumerous. Hence if two sets are presented and they both have infinite data, is is possible that both sets may not have the same amount of data. Part of his proof showed that there are an infinite amount of real numbers and an infinite amount of natural numbers but there are more real numbers than infinite numbers. His proofs were controversial to philosophers of his day, but future mathematicians embraced his work and developed it into the modern concept of set theory. Thus what is normally taught as a beginning skill in statistics, actually has its roots as a topic that changed philosophy and the world view of mathematics.
For deeper information on Georg Cantor, see the Wikipedia article, or read a biography.
MIT OCW Exercise for Ch 1.
Suppose that A ⊂ B. Show that Bc ⊂ Ac.
The subset is the white area while Ac is both white and yellow areas.
I would like to thanks DWin from StackExchange for his guidance on the VennDiagram package.
Improvements I should consider for this diagram.
- Add the label for universal set U (which would equal the area for Bc ).
- Fill in the universal set U with a color other than white.
- Remove the number values as they are not important for this exercise.
Monday, February 13, 2012
MIT Open Courseware statistics class, and GNU R
It has been a long time since I performed any serious statistical problems. To update my skills within my own time I am participating in the MIT OpenCourseWare undergraduate level statistics class, but with a twist. I will use GNU R and it's packages for the exercises within the book. I have been using GNU R on and off for my own use. This gives me a better opportunity to build skills with GNU R while strengthening my statistical skills.
So the first chapter in the course book covers Venn diagrams. While simple Venn diagrams are easy to produce, readable complex ones require some skill in art or assistance of a program designed to make them. GNU R seems to have two popular packages to perform this task: VennEuler and VennDiagram.
VennEuler is good for getting started with semi-simple diagrams. You will note that the name mentions both Venn and Euler diagrams. They are similar diagrams whose main difference is how three or more overlapping circles handle area with null data. BMC Bioinformatics posted an example between the two diagrams below.
The GNU R code for a simple Venn diagram below
require(venneuler)
v <- venneuler(c(A=200, B=200, "A&B"=100))
v$labels<- c("Green", "Blue")
plot(v)
text(.5, .6, "my text here")
produces the following Venn diagram.
While VennEuler is good for simple Venn diagrams, the package VennDiagram gives the user greater control over the diagrams. Here is an example of a complex Venn diagram from BMC Bioinformatics.
The code for this diagram follows
x = list(
I = c(1:60, 61:105, 106:140, 141:160, 166:175, 176:180, 181:205, 206:220),
IV = c(531:605, 476:530, 336:375, 376:405, 181:205, 206:220, 166:175, 176:180),
II = c(61:105, 106:140, 181:205, 206:220, 221:285, 286:335, 336:375, 376:405),
III = c(406:475, 286:335, 106:140, 141:160, 166:175, 181:205, 336:375, 476:530)
),
filename = "1D-quadruple_Venn.tiff",
col = "black",
lty = "dotted",
lwd = 4,
fill = c("cornflowerblue", "green", "yellow", "darkorchid1"),
alpha = 0.50,
label.col = c("orange", "white", "darkorchid4", "white", "white", "white", "white", "white", "darkblue", "white", "white", "white", "white", "darkgreen", "white"),
cex = 2.5,
fontfamily = "serif",
fontface = "bold",
cat.col = c("darkblue", "darkgreen", "orange", "darkorchid4"),
cat.cex = 2.5,
cat.fontfamily = "serif"
);
Reviewing the sample code and the available documentation for both packages, the VennDiagram package contains a larger library of statements for granular control of the diagram. As I perform the exercises in the statistics book, I will attempt a mix of diagrams from each package and provide examples of what I learned from the chapter and from the use of both GNU R packages.
So the first chapter in the course book covers Venn diagrams. While simple Venn diagrams are easy to produce, readable complex ones require some skill in art or assistance of a program designed to make them. GNU R seems to have two popular packages to perform this task: VennEuler and VennDiagram.
VennEuler is good for getting started with semi-simple diagrams. You will note that the name mentions both Venn and Euler diagrams. They are similar diagrams whose main difference is how three or more overlapping circles handle area with null data. BMC Bioinformatics posted an example between the two diagrams below.
The GNU R code for a simple Venn diagram below
require(venneuler)
v <- venneuler(c(A=200, B=200, "A&B"=100))
v$labels<- c("Green", "Blue")
plot(v)
text(.5, .6, "my text here")
produces the following Venn diagram.
While VennEuler is good for simple Venn diagrams, the package VennDiagram gives the user greater control over the diagrams. Here is an example of a complex Venn diagram from BMC Bioinformatics.
The code for this diagram follows
library(VennDiagram);
venn.diagram(x = list(
I = c(1:60, 61:105, 106:140, 141:160, 166:175, 176:180, 181:205, 206:220),
IV = c(531:605, 476:530, 336:375, 376:405, 181:205, 206:220, 166:175, 176:180),
II = c(61:105, 106:140, 181:205, 206:220, 221:285, 286:335, 336:375, 376:405),
III = c(406:475, 286:335, 106:140, 141:160, 166:175, 181:205, 336:375, 476:530)
),
filename = "1D-quadruple_Venn.tiff",
col = "black",
lty = "dotted",
lwd = 4,
fill = c("cornflowerblue", "green", "yellow", "darkorchid1"),
alpha = 0.50,
label.col = c("orange", "white", "darkorchid4", "white", "white", "white", "white", "white", "darkblue", "white", "white", "white", "white", "darkgreen", "white"),
cex = 2.5,
fontfamily = "serif",
fontface = "bold",
cat.col = c("darkblue", "darkgreen", "orange", "darkorchid4"),
cat.cex = 2.5,
cat.fontfamily = "serif"
);
Reviewing the sample code and the available documentation for both packages, the VennDiagram package contains a larger library of statements for granular control of the diagram. As I perform the exercises in the statistics book, I will attempt a mix of diagrams from each package and provide examples of what I learned from the chapter and from the use of both GNU R packages.
Tuesday, June 28, 2011
Differences in Linear Programming models
With my mind back on to the Science of Decision Making book, I have two different ways to build the same linear programming model. The book describes how to do so using Excel. So below is a screen shot of the model in Excel before it is resolved. E3:E8 uses the function of SUMPRODUCT of the related values in columns B, C, and D.
Seeking to maximize E8, it is subject to
$B$9:$D$9 >= 0
$E$3:$E$7 <= $G$3:$G$7
with the changing cells $B$9:$D$9.
This results in the answer below.
Produce
20 of S
30 of F
0 of L
Now to do the same problem using the GNU Linear Programming Kit (GLPK), the model is built as follows.
--------
# This finds the optimal solution for maximizing the RV plants's profit
#
/* Decision variables */
var x1 >=0; /* Standard */
var x2 >=0; /* Fancy */
var x3 >=0; /* Luxury */
/* Objective function */
maximize z: 840*x1 + 1120*x2 + 1200*x3;
/* Constraints */
s.t. Engine_Shop : 3*x1 + 2*x2 + 1*x3 <= 120;
s.t. Body_Shop : 1*x1 + 2*x2 + 3*x3 <= 80;
s.t. Standard_Fin : 2*x1 + 0*x2 + 0*x3 <= 96;
s.t. Fancy_Fin : 0*x1 + 3*x2 + 0*x3 <= 102;
s.t. Luxury_Fin : 0*x1 + 0*x2 + 2*x3 <= 40;
end;
--------
While spreadsheet fans may find this slightly harder to follow, the format in this tool is closer to how a student defines a linear programming problem in class, only slightly more complex given scripting variables. GLPK produces the following report after running "glpsol -m glpk_Science_of_decision_Ch001.mod -o test.sol"
--------
Problem: glpk_Science_of_decision_Ch001
Rows: 6
Columns: 3
Non-zeros: 12
Status: OPTIMAL
Objective: z = 50400 (MAXimum)
No. Row name St Activity Lower bound Upper bound Marginal
------ ------------ -- ------------- ------------- ------------- -------------
1 z B 50400
2 Engine_Shop NU 120 120 140
3 Body_Shop NU 80 80 420
4 Standard_Fin B 40 96
5 Fancy_Fin B 90 102
6 Luxury_Fin B 0 40
No. Column name St Activity Lower bound Upper bound Marginal
------ ------------ -- ------------- ------------- ------------- -------------
1 x1 B 20 0
2 x2 B 30 0
3 x3 NL 0 0 -200
Karush-Kuhn-Tucker optimality conditions:
KKT.PE: max.abs.err. = 0.00e+000 on row 0
max.rel.err. = 0.00e+000 on row 0
High quality
KKT.PB: max.abs.err. = 0.00e+000 on row 0
max.rel.err. = 0.00e+000 on row 0
High quality
KKT.DE: max.abs.err. = 1.14e-013 on column 1
max.rel.err. = 1.35e-016 on column 1
High quality
KKT.DB: max.abs.err. = 0.00e+000 on row 0
max.rel.err. = 0.00e+000 on row 0
High quality
End of output
--------
While the report introduces a greater amount of information, we can still see the same results.
The company should produce
20 of the standard
30 of the fancy
0 of the luxury
The Google Docs spreadsheet looks almost the same as Excel for the purposes of this post so a screenshot of it is excluded. The only difference between the two for this example is how the constraints are setup. See the previous post for the details.
Resolved Google Docs solver issue
Well it seems the Solver in Google Docs spreadsheet can produce the same answer as Excel.
The issue comes down to how you enter the constraints. Excel can enter constraints as a range: $E$3:$E$7 <= $G$3:$G$7 but in Google Docs Spreadsheet, a range can not be used. Each constraining pair must be entered as a separate constraint.
Hence
E3 <= G3
E4 <= G4
E5 <= G5
E6 <= G6
E7 <= G7
So as I make my way through the book and try the examples in both Excel and Google Docs, I am sure I will find other issues.
The issue comes down to how you enter the constraints. Excel can enter constraints as a range: $E$3:$E$7 <= $G$3:$G$7 but in Google Docs Spreadsheet, a range can not be used. Each constraining pair must be entered as a separate constraint.
Hence
E3 <= G3
E4 <= G4
E5 <= G5
E6 <= G6
E7 <= G7
So as I make my way through the book and try the examples in both Excel and Google Docs, I am sure I will find other issues.
Saturday, June 11, 2011
Using Solver tool in Excel and Google Docs
While I was working on the first problem in The Science of Decision Making book, it dawned on me that not only does Excel have a solver tool for Linear Programming problems, but Google docs spreadsheet also has a solver. So I tried the problem in both tools. Unfortunately I got different results. I tried the same steps over and over again to confirm if I make a mistake. So far they do not reconcile.
Results in Excel (which match the results in the book)
So I will email Google to find out if there is something I am missing.
Science of Decision Making book and the many tools to optimize a decision
One of the great benefits of blogging is stating a goal and using your public announcement to motivate you to finish the goal. If the writer (me) does not accomplish the goal, it's embarrassing and becomes rather boring blogging. Hence my first post is about my goal to complete the work within the book The Science of Decision Making: A Problem-Based Approach Using Excel. Only in my case, I am looking to process the examples in both Excel and the program GNU Linear Programming Kit.
So why do I want to do this ? In my past experience with computer security I found a lot of peers, journals, and experts talked about risk, but few could talk about risk in a similar manner to an insurance actuary. Many expert technology companies and government institutions have a difficult time determining risk and making optimal decisions. I do not make any claims that I am any better at making decisions than the management of these organizations at this time. My interest is building skills that could give me the ability to determine security risks in a similar pattern to an insurance company. I am hoping that these skills will give me an advantage in any complex decision I need to make. At very least, I can use the book's content to build decision making skills in a way that is rational and provable. Of course if data collection and life was perfect, I could use these skills to solve every problem thrown at me like Charlie Eppes in the TV show Numb3rs, but the realistic goal would be to match the accuracy of insurance companies for IT security or other complex decisions.
So why do I want to do this ? In my past experience with computer security I found a lot of peers, journals, and experts talked about risk, but few could talk about risk in a similar manner to an insurance actuary. Many expert technology companies and government institutions have a difficult time determining risk and making optimal decisions. I do not make any claims that I am any better at making decisions than the management of these organizations at this time. My interest is building skills that could give me the ability to determine security risks in a similar pattern to an insurance company. I am hoping that these skills will give me an advantage in any complex decision I need to make. At very least, I can use the book's content to build decision making skills in a way that is rational and provable. Of course if data collection and life was perfect, I could use these skills to solve every problem thrown at me like Charlie Eppes in the TV show Numb3rs, but the realistic goal would be to match the accuracy of insurance companies for IT security or other complex decisions.
Subscribe to:
Posts (Atom)