ENTRIES TAGGED "OSCON"

Prefer goals over controls

A common vision is more important than seeking approval

Bruce Eckel is well known for his books in the field of programming, such as Thinking in Java, Thinking in C++, and Atomic Scala as well as his co-leadership of the Java Posse. And yet, on top of his work in programming, he has spent the last several years investing in and researching a topic that seems quite different, yet is intertwined with the destiny of software companies: the culture and operation of businesses.

In his OSCON 2013 Mainstage session, Bruce lays the foundation of modern business management, including a look back as far as Taylorism, a survey of what it means to get an MBA, and ways to gain this knowledge outside of the traditional institutions.

Read more…

Comment |

Proposing a Compelling OSCON Talk

Intriguing reviewers and attendees

The OSCON call for proposals closes on January 30th. I’ve seen some great explanations of how to write a conference proposal generally, and my co-chair Matthew McCullough has an excellent presentation on how to write proposals and presentations:

Tailoring talks to specific conferences sometimes takes a bit more. For OSCON, we’d like talks centered on the Open Source computing ecosystem, but the call just lays out the broad story. What helps a proposal that’s in that neighborhood become a session? If you have an idea, how do you develop it?

Audience: Who are they? What do they want?

My first suggestion to anyone proposing a talk (or a book, or even a blog post) is to focus on audience. Who is going to be interested in what you want to discuss? Will they be at that event? What should they know before they get there? How can you convince them that it’s worth their time to join your conversation? Even for lectures and books, thinking of it as a conversation helps to focus planning.

Read more…

Comment |

Tim O’Reilly Urges Developers and Entrepreneurs to Make Moonshots

OSCON Mainstage Talks: Create more value than you capture

At OSCON 2013, Tim (@timoreilly) asked us to aim higher and work on difficult problems while highlighting the most important trends that should be guiding open source developers and entrepreneurs. To illustrate his points, he offered up great examples from companies as diverse as Google, Square, Wikipedia, and O’Reilly Media.
Read more…

Comment |

Will Developers Move to Sputnik?

The past, present, and future of Dell's project

Barton George (@barton808) is the Director of Development Programs at Dell, and the lead on Project Sputnik—Dell’s Ubuntu-based developer laptop (and its accompanying software). He sat down with me at OSCON to talk about what’s happened in the past year since OSCON 2012, and why he thinks Sputnik has a real chance at attracting developers.

Key highlights include:

  • The developers that make up Sputnik’s ideal audience [Discussed at 1:00]
  • The top three reasons you should try Sputnik [Discussed at 2:46]
  • What Barton hopes to be talking about in 2014 [Discussed at 4:36]
  • The key to building a community is documentation [Discussed at 5:20]

You can view the full interview here:

Read more…

Comment |

In Praise of the Lone Contributor

The O'Reilly Open Source Awards 2013

Over the years, OSCON has become a big conference. With over 3900 registered this year, it was hard not to look at the packed hallways and sessions and think what a huge crowd it is. The number of big-name companies participating–Microsoft, Google, Dell, and even General Motors–reinforce the popular refrain that open source has come a long way; it’s all mainstream now.

Which is as it should be. And it’s been a long haul. But thinking of open source in terms of numbers and size puts us in danger of forgetting the very thing that makes open source special, and that’s the individual contributor. So while open source software has indeed found a place in almost every organization that exists, it was made possible by the hard work of real people who saw the need for it, most of them volunteering in their spare time.

The O’Reilly Open Source Awards were created to recognize and thank these individuals. It’s a community-driven effort: nominations come in from the open source community (this year there were around 50) and then are judged by the previous year’s winners. It’s not intended to be political or a popularity contest, but honest appreciation for hard work that matters. Let’s look at this year’s winners.
Read more…

Comment: 1 |

Open Source convention considers situational awareness in cars, and more

A report from OSCon

Every conference draws people in order to make contacts, but the Open Source convention also inspires them with content. I had one friend withdraw from an important business meeting (sending an associate) in order to attend a tutorial.

Lots of sessions and tutorials had to turn away attendees. This was largely fall-out from the awkward distribution of seats in the Oregon Convention Center: there are just half a dozen ballroom-sized spaces, forcing the remaining sessions into smaller rooms that are more appropriate for casual meetings of a few dozen people. When the conference organizers measure the popularity of the sessions, I suggest that any session at or near capacity have its attendance counted as infinity.

More than 3,900 people registered for OSCon 2013, and a large contingent kept attending sessions all the way through Friday.

Read more…

Comment |

Open Source In The Classroom

OSCON 2013 Speaker Series

To teach computer programming to a young person with no experience, you must imagine what it’s like to know nothing about languages, algorithms, data structures, design patterns, object orientation, development tools, etc. Yet the kids I’ve seen in high school over the past several years are so immersed in a world of computing, they have interesting partial understandings of how things work and usually far deeper knowledge of what’s being done with the technologies at a consumer level than their teacher. The contemporary adolescent, a child of the late 1980′s and early 1990′s, has lived a life that parallels the advent of the web with no experience of a world before such powerful new verbs as email, Google, or blog. On the other hand, most have never seen the HTML of a web page or the IP address of a networked computer, and none of them can guess what a compiler does. As a high school computer programming teacher, I am usually the first to help them bridge that enormous and growing gap, and it doesn’t help that the first thing they ask is “How do I write an app for my phone?”

So I need to remember what it was like for me to learn programming for the first time given that, as a child of the early 1970′s, my own life parallels the advent of the home computer. I go backwards in time, way before my modern toolkit mainstays like Java, XML, or Linux (or their amalgam in Android). Programming days came before my Webmaster days, before my analyst days, before I employed scripting languages like Perl, JavaScript, Tcl, or even the Unix shells. I was programming before college, where I used older languages like C, Lisp, or Pascal. When I fondly reminisce about banging out BASIC programs on my beloved Commodore-64 I think surely I’ve arrived at the beginning, but no. When I really do recall the very first time I wrote an original computer program, that was even longer ago: it was 1984. I remember my seventh grade math class, in a Massachusetts public school, being escorted into a lab crammed with no more than a dozen new Apple IIe computers. There we wrote programs in a language called Logo, although everybody just called it “turtle”.

Logo, first created in Cambridge, MA in 1967, was designed from the outset as a programming language for teaching about programming. The open source KDE education project includes an application, KTurtle, for writing programs in Turtlescript, a modern adaptation of Logo. I have found KTurtle to be an extremely effective introduction to computer programming in my own classroom. The complete and well-written Turtlescript language reference can be printed in a small packet of about eight double-sided pages which I can distribute to each pupil. Your first view is a simple IDE, with a code editor that does multi-color highlighting, an inspector for tracing variables and functions, and a canvas for displaying the output of your program. I can demonstrate commands in seconds, gratifying students with immediate results as the onscreen turtle draws lines on the canvas. I can slow the speed of the turtle or step through a program to show what’s going on at each command of a larger program. I can save my final output as an image file, having limited but definite value to a young person. The kids get it, and with about 15 minutes of explanation, a few examples, and a handful of commands they are ready and eager to try themselves.

A modern paradigm of teaching secondary mathematics is giving students a sequence of tasks that give them a progressive understanding of concepts by putting them in situations where they struggle enough to realize the need for new insights. If I ask the students to use KTurtle to draw a square, having only provided them with the basic commands for moving the turtle, they can quickly come up with something like this:

If the next tasks involve a greater number of adjacent squares, this approach becomes tiresome. Now I have the opportunity to introduce looping control structures (along with the programming virtue of laziness) to show them a better way:

first_tasks

As I give them a few more tasks with squares, some students may continue to try to draw the pictures in one giant block of commands, creating complicated Eulerian trails to trace each figure, but are eventually convinced by the obvious benefits and their peers to make the switch. When they graduate to cutting and pasting their square loop repeatedly, I can introduce subroutines:
Read more…

Comments: 3 |

Why Choose a Graph Database

Collaborative filtering with Neo4j

By this time, chances are very likely that you’ve heard of NoSQL, and of graph databases like Neo4j.

NoSQL databases address important challenges that we face today, in terms of data size and data complexity. They offer a valuable solution by providing particular data models to address these dimensions.

On one side of the spectrum, these databases resolve issues for scaling out and high data values using compounded aggregate values, on the other side is a relationship based data model that allows us to model real world information containing high fidelity and complexity.

Neo4j, like many other graph databases, builds upon the property graph model; labeled nodes (for informational entities) are connected via directed, typed relationships. Both nodes and relationships hold arbitrary properties (key-value pairs). There is no rigid schema, but with node-labels and relationship-types we can have as much meta-information as we like. When importing data into a graph database, the relationships are treated with as much value as the database records themselves. This allows the engine to navigate your connections between nodes in constant time. That compares favorably to the exponential slowdown of many-JOIN SQL-queries in a relational database.

property-graph

How can you use a graph database?

Graph databases are well suited to model rich domains. Both object models and ER-diagrams are already graphs and provide a hint at the whiteboard-friendliness of the data model and the low-friction mapping of objects into graphs.

Instead of de-normalizing for performance, you would normalize interesting attributes into their own nodes, making it much easier to move, filter and aggregate along these lines. Content and asset management, job-finding, recommendations based on weighted relationships to relevant attribute-nodes are some use cases that fit this model very well.

Many people use graph databases because of their high performance online query capabilities. They process large amounts or high volumes of raw data with Map/Reduce in Hadoop or Event-Processing (like Storm, Esper, etc.) and project the computation results into a graph. We’ve seen examples of this from many domains from financial (fraud detection in money flow graphs), biotech (protein analysis on genome sequencing data) to telco (mobile network optimizations on signal-strength-measurements).

Graph databases shine when you can express your queries as a local search using a few starting points (e.g., people, products, places, orders). From there, you can follow relevant relationships to accumulate interesting information, or project visited nodes and relationships into a suitable result.

Read more…

Comment: 1 |

A Hands-on Introduction to R

OSCON 2013 Speaker Series

R is an open-source statistical computing environment similar to SAS and SPSS that allows for the analysis of data using various techniques like sub-setting, manipulation, visualization and modeling. There are versions that run on Windows, Mac OS X, Linux, and other Unix-compatible operating systems.

To follow along with the examples below, download and install R from your local CRAN mirror found at r-project.org. You’ll also want to place the example CSV into your Documents folder (Windows) or home directory (Mac/Linux).

After installation, open the R application. The R Console will pop-up automatically. This is where R code is processed. To begin writing code, open an editor window (File -> New Script on Windows or File -> New Document on a Mac) and type the following code into your editor:

Place your cursor anywhere on the “1+1” code line, then hit Control-R (in Windows) or Command-Return (in Mac). You’ll notice that your “1+1” code is automatically executed in the R Console. This is the easiest way to run code in R. You can also run R code by typing the code directly into your R Console, but using the editor is much easier.

If you want to refresh your R Console, click anywhere inside of it and hit Control-L (in Windows) or Command-Option-L (in Mac).

Now let’s create a Vector, the simplest possible data structure in R. A Vector is similar to a column of data inside a spreadsheet. We use the combine function to do so:

To view the contents of raysVector, just run the line of code above. After running the code shown above, double-click on raysVector (in the editor) and then run the code that is automatically highlighted after double-clicking. You will now see the contents of raysVector in your R Console.

The object we just created is now stored in memory and we can see this by running the following code:

R is an interpreted language with support for procedural and object-oriented programming. Here we use the mean statistical function to calculate the statistical mean of raysVector:

Getting help on the mean function is easy using:

We can create a simple plot of raysVector using:

Importing CSV files is simple too:

We can subset the CSV data in many different ways. Here are two different methods that do the same thing:

There are many ways to transform your data in R. Here’s a method that doubles everyone’s age:

The apply function allows us to apply a standard or custom function without loops. Here we apply the mean function column-wise to the first 3 rows of the dataset in order to analyze the age and height columns of the dataset. We will also ignore missing values during the calculation:

Here we build a linear regression model that predicts a person’s weight based on their age and height:

We can plot our residuals like this:

We can install the Predictive Model Markup Language (PMML) package to quickly deploy our predictive model in a Business Intelligence system without custom SQL: Read more…

Comment |

Get Hadoop, Hive, and HBase Up and Running in Less Than 15 Minutes

OSCON 2013 Speaker Series

If you have delved into Apache Hadoop and related projects, you know that installing and configuring Hadoop is hard. Often, a minor mistake during installation or configuration with messy tarballs will lurk for a long time until some otherwise innocuous change to the system or workload causes difficulties. Moreover, there is little to no integration testing among different projects (e.g. Hadoop, Hive, HBase, Zookeeper, etc.) in the ecosystem. Apache Bigtop is an open source project aimed at bridging exactly those gaps by:

1. Making it easier for users to deploy and configure Hadoop and related projects on their bare metal or virtualized clusters.

2. Performing integration testing among various components in the Hadoop ecosystem.

More about Apache Bigtop

The primary goal of Apache Bigtop is to build a community around the packaging and interoperability testing of Hadoop related projects. This includes testing at various levels (packaging, platform, runtime, upgrade, etc.) developed by a community with a focus on the system as a whole, rather than individual projects.

The latest released version of Apache Bigtop is Bigtop 0.5 which integrates the latest versions of various projects including Hadoop, Hive, HBase, Flume, Sqoop, Oozie and many more! The supported platforms include CentOS/RHEL 5 and 6, Fedora 16 and 17, SuSE Linux Enterprise 11, OpenSuSE 12.2, Ubuntu LTS Lucid and Precise, and Ubuntu Quantal.

Who uses Bigtop?

Folks who use Bigtop can be divided into two major categories. The first category of users are those who leverage Bigtop to power their own Hadoop Distributions. The second category of users are those who use Bigtop for deployment purposes.

In alphabetical order, they are:
Read more…

Comment |