"optimization" entries

It’s time for a web page diet

Site speed is essential to business success, yet many pages are getting bigger and slower.

Illustration of scaleEarlier this year, I was researching online consumer preferences for a client and discovered, somewhat unsurprisingly, that people expect web sites to be fast and responsive, particularly when they’re shopping. What did surprised me, however, were findings in Radware’s “State of the Union Report Spring 2014” (registration required) that showed web sites, on average, were becoming bigger in bytes and slower in response time every year. In fact, the average Alexa 1000 web page has grown from around 780KB and 86 resources in 2011 to more than 1.4MB and 99 resources by the time of the early “2014 State of the Union Winter Report.”

As an experiment, I measured the resources loaded for Amazon.com on my own computer: 2.6MB loaded with 252 requests!

This seemed so odd. Faster is more profitable, yet companies were actually building fatter and slower web sites. What was behind all these bytes? Had web development become so sophisticated that all the technology would bust the seams of the browser window? Read more…

Tailoring CSS for performance

Rethinking CSS delivery

In my last article, I demonstrated how improved performance and a lower PageSpeed Insights score were accomplished by removing unnecessary external JavaScript and CSS requests. YepNope was also used to manage the asynchronous loading of external requests.

After the improvements, I thought it was time to move on but PageSpeed Insights advised there was more work to do.

Read more…

Tailoring for performance

One source does not fit all

Like a lot of web teams, O’Reilly’s web group has increased its focus on using global components to better scale maintenance and optimize workflow. From a load-time measurement perspective, our performance ratings stay near benchmarks. However, after a recent analysis, using metrics other than load time, we found that our global efforts may have sacrificed performance on a handful of highly visible and heavily visited web pages.

Identifying the popular pages, we sought to improve the use of global components with server side logic, regex, and asynchronous loading. After re-measuring these popular pages, we arrived at faster load times with improved perception of speed. Read more…

Scaling People, Process, and Technology with Python

OSCON 2013 Speaker Series

NOTE: If you are interested in attending OSCON to check out Dave’s talk or the many other cool sessions, click over to the OSCON website where you can use the discount code OS13PROG to get 20% off your registration fee.

Since 2009, I’ve been leading the optimization team at AppNexus, a real-time advertising exchange. On this exchange, advertisers participate in real-time auctions to bid on individual ad impressions. The highest bid wins the auction, and that advertiser gets to show an ad. This allows advertisers to carefully target where they advertise—maximizing the effectiveness of their advertising budget—and lets websites maximize their ad revenue.

We do these auctions often (~50 billion a day) and fast (<100 milliseconds). Not surprisingly, this creates a lot of technical challenges. One of those challenges is how to automatically maximize the value advertisers get for their marketing budgets—systematically driving consumer engagement through ad placements on particular websites, times of day, etc.—and we call this process “optimization.” The volume of data is large, and the algorithms and strategies aren’t trivial.

In order to win clients and build our business to the scale we have today, it was crucial that we build a world-class optimization system. But when I started, we didn’t have a scalable tech stack to process the terabytes of data flowing through our systems every day, and we didn't have the team to do any of the required data modeling.

People

So, we needed to hire great people fast. However, there aren’t many veterans in the advertising optimization space, and because of that, we couldn’t afford to narrow our search to only experts in Java or R or Matlab. In order to give us the largest talent pool possible to recruit from, we had to choose a tech stack that is both powerful and accessible to people with diverse experience and backgrounds. So we chose Python.

Python is easy to learn. We found that people coding in R, Matlab, Java, PHP, and even those who have never programmed before could quickly learn and get up to speed with Python. This opened us up to hiring a tremendous pool of talent who we could train in Python once they joined AppNexus. To top it off, there’s a great community for hiring engineers and the PyData community is full of programmers who specialize in modeling and automation.

Additionally, Python has great libraries for data modeling. It offers great analytical tools for analysts and quants and when combined, Pandas, IPython, and Matplotlib give you a lot of the functionality of Matlab or R. This made it easy to hire and onboard our quants and analysts who were familiar with those technologies. Even better, analysts and quants can share their analysis through the browser with IPython.

Process

Now that we had all of these wonderful employees, we needed a way to cut down the time to get them ramped up and pushing code to production.

First, we wanted to get our analysts and quants looking at and modeling data as soon as possible. We didn’t want them worrying about writing database connector code, or figuring out how to turn a cursor into a data frame. To tackle this, we built a project called Link.

Imagine you have a MySQL database. You don’t want to hardcode all of your connection information because you want to have a different config for different users, or for different environments. Link allows you to define your “environment” in a JSON config file, and then reference it in code as if it is a Python object.

 { "dbs":{
  "my_db": {
   "wrapper": "MysqlDB",
   "host": "mysql-master.123fakestreet.net",
   "password": "",
   "user": "",
   "database": ""
  }
 }}

Now, with only three lines of code you have a database connection and a data frame straight from your mysql database. This same methodology works for Vertica, Netezza, Postgres, Sqlite, etc. New “wrappers” can be added to accommodate new technologies, allowing team members to focus on modeling the data, not how to connect to all these weird data sources.

In [1]: from link import lnk
 
In [2]: my_db = lnk.dbs.my_db
 
In [3]: df = my_db.select('select * from my_table').as_dataframe()
 

Int64Index: 325 entries, 0 to 324
Data columns:
id    325 non-null values
user_id   323 non-null values
app_id   325 non-null values
name    325 non-null values
body    325 non-null values
created   324 non-null values

By having the flexibility to easily connect to new data sources and APIs, our quants were able to adapt to the evolving architectures around us, and stay focused on modeling data and creating algorithms.

Second, we wanted to minimize the amount of work it took to take an algorithm from research/prototype phase to full production scale. Luckily, with everyone working in Python, our quants, analysts, and engineers are using the same language and data processing libraries. There was no need to re-implement an R script in Java to get it out across the platform.
Read more…

Interoperating the industrial Internet

If we're going to build useful applications on top of the industrial Internet, we must ensure the components interoperate.

One of the most interesting points made in GE’s “Unleashing the Industrial Internet” event was GE CEO Jeff Immelt’s statement that only 10% of the value of Internet-enabled products is in the connectivity layer; the remaining 90% is in the applications that are built on top of that layer. These applications enable decision support, the optimization of large scale systems (systems “above the level of a single device,” to use Tim O’Reilly’s phrase), and empower consumers.

Given the jet engine that was sitting on stage, it’s worth seeing how far these ideas can be pushed. Optimizing a jet engine is no small deal; Immelt said that the engine gained an extra 5-10% efficiency through software, and that adds up to real money. The next stage is optimizing the entire aircraft; that’s certainly something GE and its business partners are looking into. But we can push even harder: optimize the entire airport (don’t you hate it when you’re stuck on a jet waiting for one of those trucks to push you back from the gate?). Optimize the entire air traffic system across the worldwide network of airports. This is where we’ll find the real gains in productivity and efficiency.

So it’s worth asking about the preconditions for those kinds of gains. It’s not computational power; when you come right down to it, there aren’t that many airports, aren’t that many flights in the air at one time. There are something like 10,000 flights in the air at one time, worldwide; and in these days of big data, and big distributed systems, that’s not a terribly large number. It’s not our ability to write software; there would certainly be some tough problems to solve, but certainly nothing as difficult as, say, searching the entire web and returning results in under a second. Read more…

Industrial Internet links

Robots, railways, the Internet of very small things, and SQL injection in solar panels.

Here’s a broad look at a few recent items of interest related to the industrial Internet — the world of smart, connected, big machines.

Smarter Robots, With No Wage Demands (Bloomberg Businessweek) — By building more intelligence into robots, Rethink Robotics figures it can get them into jobs where work has historically been too irregular or too small-scale for automation. That could mean more manufacturing stays in American factories, though perhaps with fewer workers.

The Great Railway Caper (O’Reilly Strata EU) — Today’s railroads rely heavily on the industrial Internet to optimize locomotive operations and maintain their very valuable physical plant. Some of them were pioneers in big networked machines. Part of Sprint originated as the Southern Pacific Railroad Network of Intelligent Telecommunications, which used the SP’s rights-of-way to transmit microwave and fiber optic signals. But in the 1950s, computing in railways was primitive (as it was just about everywhere else, too). John Graham-Cumming relayed this engaging story of network optimization in 1955 at our Strata Conference in London two weeks ago.

Read more…

Six themes from Velocity Europe

Cultural shifts and handling large-scale growth among the emerging trends in the WPO and DevOps communities

By Steve Souders and John Allspaw

More than 700 performance and operations engineers were in London last week for Velocity Europe 2012. Below, Velocity co-chairs Steve Souders and John Allspaw note high-level themes from across the various tracks (especially the hallway track) that are emerging for the WPO and DevOps communities.

Velocity Europe 2012 in London

Performance themes from Steve Souders

I was in awe of the speaker and exhibitor lineup going into Velocity Europe. It was filled with knowledgeable gurus and industry leaders. As Velocity Europe unfolded a few themes kept recurring, and I wanted to share those with you.

Performance matters more — The places and ways that web performance matters keeps growing. The talks at Velocity covered desktop, mobile (native, web, and hybrid), tablet, TV, DSL, cable, FiOS, 3G, 4G, LTE, and WiMAX across social, financial, ecommerce, media, games, sports, video, search, analytics, advertising, and enterprise. Although all of the speakers were technical, they talked about how the focus on performance extends to other departments in their companies as well as the impact performance has on their users. Web performance has permeated all aspects of the web and has become a primary focus for web companies. Read more…

Giving the Velocity website a performance makeover

Four simple optimization steps produce big results.

Learn how producers slimmed down the Velocity conference site, cutting the site's load time by 3.5 seconds and dropping 49% of the page weight.

Velocity 2011 debrief

Steve Souders weighs in on Velocity 2011 and looks ahead to upcoming Velocity events.

This was Velocity's fourth year, and while every year has seen significant growth, the 2011 conference felt like a tremendous step forward in all areas.

Radar's top stories: May 30-June 3, 2011

Inside the Library of Congress' Twitter archive, 10 ways to botch a mobile app, the story behind Velocity '11

This week on Radar: We checked in on the Library of Congress' Twitter archive, Ken Yarmosh revealed 10 ways to screw up a mobile app, the story behind Velocity 2011 was told, Steve Souders discussed mobile optimization, and we wondered if readers would fund their favorite authors.