ENTRIES TAGGED "Java"
20 years of efficiently computing Bacon numbers
The Oracle at Delphi spoke just one language, a cryptic one that priests “compiled” into ancient Greek. The Oracle of Bacon—the website that plays the Six Degrees of Kevin Bacon game for you—has, in its 20-year existence, been written in six languages. Read on for the history and the reasons why.
The original version of the Oracle of Bacon, written by Brett Tjaden in 1995, was all C. The current version, my stewardship of it, and my revision control history only go back to 1999, so that’s where I’ll start. In 1999, I rewrote the Oracle… still entirely in C. Expensive shortest-path and spell-check algorithms? Definitely in C. String processing to build the database? Also C! Presentation layer to parse CGI parameters and generate HTML? C here, too!
Why C? The rationale for the algorithmic component was straightforward: the Oracle of Bacon ran on a slow, shared Unix machine that other people were using to get actual work done. Minimizing CPU and memory resource requirements was the polite thing to do. I needed a compiled language that let me optimize time and space extensively. The loops all counted down, not up, because comparing against zero was fractionally faster on SPARC. It had to be C.
But why were the offline string processing and the CGIs in C? Mostly, I think, to reuse code from the other parts of the code base and from previous projects I’d written when C was the only language I knew.
As the site added features, I got tired of writing code to generate HTML in C. I wrote new CGIs, then rewrote existing CGIs, in Perl. Simply put, writing the CGIs in an interpreted language made me more productive. I had hash tables and vectors built into the language and CGI support a simple “use” statement away. I didn’t have to compile on one server and then deploy to another—I could edit the CGIs right there on the web server. Good deployment practices it wasn’t, but it made me more productive as a programmer, and the performance of the CGIs didn’t matter all that much.
What you'll need to know to start your Java 8 migration process today
There was recently a thread on the London Java Community mailing list about when people should think about adopting Java 8. Lambdas, an improved collections library, new date and time support, and a host of under-the-hood tweaks, add up to a lot of compelling reasons for people to upgrade. There’s still a lot of confusion over when and how to accomplish it, though, so here’s a helpful guide.
When will Java 8 be released?
The GA (General Availability) release of Oracle’s JRE and JDK, which is probably the JVM that you’re using, released March 18th. It may take other JVM vendors a while to release their implementations if you aren’t using an OpenJDK or the Oracle JDK.
So I should just upgrade on release date, right?
That would be a very brave move to make. A huge amount of resources go into testing Java 8 and ensuring that things will work out of the box on the day of release. However, the massive ecosystem of Java libraries means that not everything can be tested to destruction in time. It’s incredibly likely that there will be outstanding bugs upon release. You should expect update releases a month or two after GA, they’ll solve the major problems.
It’s also important to think about what libraries or frameworks your application depends on. If you’re just writing plain old Java then an existing codebase is likely to work fine. If, on the other hand, you depend on a library or framework that tries to do something clever then you may run into problems.
Targeting the highest common denominator
Some would claim that native is the best approach, but that looks at existing WORA tools/communities, which mostly target cost saving. In fact, even native Android/iOS tools produce rather bad results without deep platform familiarity. Native is very difficult to properly maintain in the real world and this is easily noticeable by inspecting the difficulties we have with the ports of Codename One, this problem is getting worse rather better as platforms evolve and fragment. E.g. Some devices crash when you take more than one photo in a row, some devices have complex issues with http headers, and many have issues when editing text fields in the “wrong position”.
There are workarounds for everything, but you need to do extensive testing to become aware of the problem in the first place. WORA solutions bring all the workarounds and the “ugly” code into their porting layer, allowing developers to focus on their business logic. This is similar to Spring/Java EE approaches that addressed complexities of application server fragmentation.
Let the environment do more of the work
Functional programming keeps growing. While it has long been a popular topic in academic circles, and many of my CS-educated friends wonder why it took me so long to discover it, the shift in approach that functional programming requires made it a hard sell in the commercial world. As our computers have become more and more powerful and our problems more complex, functional programming approaches and environments seem better able to shoulder those loads.
Neal Ford, Meme Wrangler at ThoughtWorks, has been showing developers how to shift from classic imperative models to cleaner functional approaches. I was lucky to get to talk with him at OSCON, and we’ve also posted his OSCON talk, with many more concrete code examples.
Every programming experience teaches
I’ve never formally trained to be a programmer, outside of occasional conference workshops and a week of XSL tutorials. In some ways, that’s terrible, because it’s taken me about thirty years to learn what some friends of mine appear to have learned in four. I’ve written some code that goes way beyond spaghetti, though fortunately the worst of it was probably when I was 15.
On the bright side, when I look past my many mistakes, I can see what I learned from a large number of various different experiences, and the pieces they helped me see. It’s a little easier to tell this story through the parts than it might be through a formal curriculum.
- My parents’ FORTRAN books
- I was reading computer books—dry ones—before I even got to play. I have vague memories about program structure, but mostly I learned that knowledge sticks better if it includes hands-on work, and not just a book.
- Sinclair ZX81
- 1K of memory! The sheer thrill of seeing my creations on screen was amazing. I had just enough logic to get things done, and leave myself puzzled. The Sinclair community seemed focused on making great small things. I learned simple logic in BASIC and that sometimes it takes a hack to get things done.
- Applesoft BASIC
- After Sinclair BASIC, Applesoft seemed vast. Much of what I did was transfer what I’d done on the Sinclair (itself a lesson in platform-shifting). As I settled, I started writing larger and larger programs, eventually forcing myself to restructure everything into subroutines…with global variables, of course.
- 6502 Assembly
- I knew there was more than BASIC. My early adventures with assembly language were mostly about graphics, and didn’t work all that well, but I picked up two key things: recursion and the importance of registers.
OSCON 2013 Speaker Series
NOTE: If you are interested in attending OSCON to check out Dave’s talk or the many other cool sessions, click over to the OSCON website where you can use the discount code OS13PROG to get 20% off your registration fee.
Since 2009, I’ve been leading the optimization team at AppNexus, a real-time advertising exchange. On this exchange, advertisers participate in real-time auctions to bid on individual ad impressions. The highest bid wins the auction, and that advertiser gets to show an ad. This allows advertisers to carefully target where they advertise—maximizing the effectiveness of their advertising budget—and lets websites maximize their ad revenue.
We do these auctions often (~50 billion a day) and fast (<100 milliseconds). Not surprisingly, this creates a lot of technical challenges. One of those challenges is how to automatically maximize the value advertisers get for their marketing budgets—systematically driving consumer engagement through ad placements on particular websites, times of day, etc.—and we call this process “optimization.” The volume of data is large, and the algorithms and strategies aren’t trivial.
In order to win clients and build our business to the scale we have today, it was crucial that we build a world-class optimization system. But when I started, we didn’t have a scalable tech stack to process the terabytes of data flowing through our systems every day, and we didn't have the team to do any of the required data modeling.
So, we needed to hire great people fast. However, there aren’t many veterans in the advertising optimization space, and because of that, we couldn’t afford to narrow our search to only experts in Java or R or Matlab. In order to give us the largest talent pool possible to recruit from, we had to choose a tech stack that is both powerful and accessible to people with diverse experience and backgrounds. So we chose Python.
Python is easy to learn. We found that people coding in R, Matlab, Java, PHP, and even those who have never programmed before could quickly learn and get up to speed with Python. This opened us up to hiring a tremendous pool of talent who we could train in Python once they joined AppNexus. To top it off, there’s a great community for hiring engineers and the PyData community is full of programmers who specialize in modeling and automation.
Additionally, Python has great libraries for data modeling. It offers great analytical tools for analysts and quants and when combined, Pandas, IPython, and Matplotlib give you a lot of the functionality of Matlab or R. This made it easy to hire and onboard our quants and analysts who were familiar with those technologies. Even better, analysts and quants can share their analysis through the browser with IPython.
Now that we had all of these wonderful employees, we needed a way to cut down the time to get them ramped up and pushing code to production.
First, we wanted to get our analysts and quants looking at and modeling data as soon as possible. We didn’t want them worrying about writing database connector code, or figuring out how to turn a cursor into a data frame. To tackle this, we built a project called Link.
Imagine you have a MySQL database. You don’t want to hardcode all of your connection information because you want to have a different config for different users, or for different environments. Link allows you to define your “environment” in a JSON config file, and then reference it in code as if it is a Python object.
Now, with only three lines of code you have a database connection and a data frame straight from your mysql database. This same methodology works for Vertica, Netezza, Postgres, Sqlite, etc. New “wrappers” can be added to accommodate new technologies, allowing team members to focus on modeling the data, not how to connect to all these weird data sources.
In : from link import lnk
In : my_db = lnk.dbs.my_db
In : df = my_db.select('select * from my_table').as_dataframe()
Int64Index: 325 entries, 0 to 324
id 325 non-null values
user_id 323 non-null values
app_id 325 non-null values
name 325 non-null values
body 325 non-null values
created 324 non-null values
By having the flexibility to easily connect to new data sources and APIs, our quants were able to adapt to the evolving architectures around us, and stay focused on modeling data and creating algorithms.
Second, we wanted to minimize the amount of work it took to take an algorithm from research/prototype phase to full production scale. Luckily, with everyone working in Python, our quants, analysts, and engineers are using the same language and data processing libraries. There was no need to re-implement an R script in Java to get it out across the platform.
Creating Glassware today and what's in store for tomorrow
You’ve likely already seen pictures of people using Google Glass, if not had an actual in-the-wild spotting as well. After getting a quick demo myself, I spoke with Maximiliano Firtman about his talk at Fluent conference that covers what developers need to start doing and thinking about when it comes to developing apps for this new environment.
Key highlights include:
- The current version supports cloud-based web applications that can be built in any language using the Mirror API. [Discussed at 0:30]
- A forthcoming SDK will support native app development, essentially Android apps written in Java. [Discussed at 2:20]
- The only truly augmented reality type application currently available is Google Maps. [Discussed at 3:30]
- Developers need to think outside the technical details as well, and spend time considering how people will be interacting with Google Glass—it’s a uniquely new paradigm with unique use cases. [Discussed at 4:14]
- While the beta (Explorer) program is currently closed, Max expects to see more devices available and “on the street” within the next year. [Discussed at 6:10]
You can view the full interview here:
A good match for the similarly unexpected Web?
Rob Pike on how Go fits into today's computing environment
The Go programming language was created by Rob Pike, Ken Thompson, and Robert Griesemer. Pike (@rob_pike) recently told me that Go was born while they were waiting a long while for some code to compile — too long.
C++ and Java have long been the go-to languages for big server or system programs, but they were created almost 30 and 20 years ago, respectively. They don’t address very well the issues programmers see today like use of concurrency and incorporating big data and they’re not optimal for the current programming environment.
One main reason that Go will succeed is how it deals with concurrency. It outpaces Java and C++ as well as Python, Ruby, and all the other scripting languages. It simply provides a better model, with Java a close second, that is able to work within the computing environment into which it was born.
Google dodges a bullet, a new Perl in town, and GCC loses an OS.
Oracle fails to convince a jury that Google owes them big bucks, the annual refresh of Perl has arrived, and FreeBSD says goodbye to an increasingly restrictive GCC license.