"language" entries

Four short links: 14 January 2011

Four short links: 14 January 2011

Borders, Monitoring, Data Visualization, and Localization

  1. What Went Wrong at Borders (The Atlantic) — a short summary of the decline and fall of Borders. Borders has a special place in our hearts at O’Reilly: it was a buyer for Borders who pointed out that Programming Perl was one of their top-selling books in any category, which got Tim focused on the Open Source story.
  2. Virtues of Monitoring — great explanation of the different levels of monitoring you could (and should) have in your application. (via Simon Willison)
  3. Getting Started with Processing and Data Visualization — a quick intro to building data visualizations with Processing. Nice variety in the examples, too. (via Hacker News)
  4. A Localization Horror Story — how hard it is to localize correctly. A wonderful article that is ruthlessly accurate in its descriptions of the pains of localizing software, which is no easier today despite the article being over a decade old.
Four short links: 13 December 2010

Four short links: 13 December 2010

Mobile Clawback, Language Design, Gawker Hacked, and Science Tools

  1. European mobile operators say big sites need to pay for users’ data demands (Guardian) — it’s like the postal service demanding that envelope makers pay them because they’re not making enough money just selling stamps. What idiocy.
  2. Grace Programming Language — language designers working on a new teaching language.
  3. Gawker Media’s Entire Database Hacked — 1.5M usernames and passwords, plus content from their databases, in a torrent. What’s your plan to minimize the harm of an event like this, and to recover? (via Andy Baio)
  4. Macmillan Do Interesting Stuff (Cameron Neylon) — have acquired some companies that provide software tools to support scientists, and are starting a new line of business around it. I like it because it’s a much closer alignment of scientists’ interests with profit motive than, say, journals. Timo Hannay, who heads it, runs Science Foo Camp with Google and O’Reilly.
Four short links: 2 August 2010

Four short links: 2 August 2010

Search Tips, Web Parsing, DNS Blacklists, Complex Machines

  1. Hidden Features of Google (StackExchange) — rather than Google’s list of search features, here are the features that real (sophisticated) users find useful. My new favourite: the ~ operator for approximate searching. (via Hacker News)
  2. Natural Language Parsing for the Web — JSON API to the Stanford Natural Language Parser. I wonder why the API to the library isn’t an open source library, given the Stanford parser is GPLv2. It’d be super-cool to have this as an EC2 instance, Ubuntu package, or Chef recipe so it’s trivial to add to an existing hosted project.
  3. Taking Back the DNS (Paul Vixie) — defining a spec whereby you can subscribe to blacklists for DNS, as Most new domain names are malicious.
  4. Building Complex Machines with Lego — I saw the (Lego) Antikythera Mechanism at Sci Foo. It’s as amazing as it looks.
Four short links: 22 June 2010

Four short links: 22 June 2010

Fast Scans, Touch Screens, Privacy Newspeak, and Open Source Fonts

  1. High-Speed Book Scanner — you flip the pages, and it uses high-speed photography to capture images of each page. “But they’re all curved!” Indeed, so they project a grid onto the page so as to be able to correct for the curvature. The creator wanted to scan Manga, but the first publisher he tried turned him down. I’ve written to him offering a pile of O’Reilly books to test on. We love this technology!
  2. Magic Tables, not Magic Windows (Matt Jones) — thoughtful piece about how touch-screens are rarely used as a controller of abstract things rather than of real things, with some examples of the potential he’s talking about. When we’re not concentrating on our marbles, we’re looking each other in the eye – chuckling, tutting and cursing our aim – and each other. There’s no screen between us, there’s a magic table making us laugh. It’s probably my favourite app to show off the iPad – including the ones we’ve designed! It shows that the iPad can be a media surface to share, rather than a proscenium to consume through alone.
  3. Myths and Fallacies of Personally Identifiable Information — particularly relevant after reading Apple’s new iTunes privacy policy. We talk about the technical and legal meanings of “personally identifiable information” (PII) and argue that the term means next to nothing and must be greatly de-emphasized, if not abandoned, in order to have a meaningful discourse on data privacy. (via Pete Warden)
  4. Mensch Font — an interesting font, but this particularly caught my eye: Naturally I searched for a font editor, and the best one I found was Font Forge, an old Linux app ported to the Mac but still requiring X11. So that’s two ways OS X is borrowing from Linux for font support. What’s up with that? Was there an elite cadre of fontistas working on Linux machines in a secret bunker? Linux is, um, not usually known for its great designers. (via joshua on Delicious)
Four short links: 20 May 2010

Four short links: 20 May 2010

New Take on Ubicomp, Language Insight, Sexy Viz, and iPad Usability

  1. People are Walking Architecture — presentation by Matt Jones of BERG, taking a new lens to this AR/ubicomp/whatever-it-is-today world. “[Mobile phones are] a whole toy box full of playful, inventive strategies for exploring cities ….”
  2. Lexicalist — insight into geographic and age distribution of language use, based on Twitter data. (via Language Log)
  3. Advanced Visualization Techniques — nice overview of some non-standard visualization techniques. Short shameful confession: I love polar dendrograms with a passion. These techniques are to visualizers as algorithms and data structures to programmers: each is used in specific circumstances and compromises some things to gain in others. (via Flowing Data)
  4. iPad Usability Report (Nielsen-Norman Group) — 93-page report based on user studies. The iPad etched-screen aesthetic does look good. No visual distractions or nerdy buttons. The penalty for this beauty is the re-emergence of a usability problem we haven’t seen since the mid-1990s: Users don’t know where they can click. For the last 15 years of Web usability research, the main problems have been that users don’t know where to go or which option to choose — not that they don’t even know which options exist. With iPad UIs, we’re back to this square one. (via Andrew Savikas)
Four short links: 1 March 2010

Four short links: 1 March 2010

War Games, Cloud Metaphors, Plain English, and Event Correlations

  1. Meet The Sims and Shoot ThemAmerica’s Army has proven so popular globally that, with so many users signing on from Internet cafes in China, the Chinese government tried to ban it. Full of interesting factoids like this about US military-created first person shooter America’s Army and other military uses of games. (via Jim Stogdill)
  2. Most Overused Cloud Metaphors, Sorted by Weather Pattern — headline writers beware: you are not being original with your “does the cloud have a silver lining?” folderol. (via lennysan on Twitter)
  3. Simply Understand — web site that translates a lot of UK government consultation documents (notorious for pompous and intricate prose) into plain English.
  4. Simple Event Correlator — small Unix part to find event correlations. It isn’t doing data mining to find correlations in a data stream, but rather you write rules like “tell me if X happens within Y seconds of a Z” and it takes events on stdin and emits correlations on stdout. (via NeilNeely on Twitter)
Four short links: 11 January 2010

Four short links: 11 January 2010

Top for MySQL, Project Surprises, and Two Odd Little Programming Languages

  1. mytop — a MySQL top implementation to show you why your server is so damn slow right now.
  2. What Could Kill Elegant High-Value Participatory Project?The problem was not that the system was buggy or hard to use, but that it disrupted staff expectations and behavior. It introduced new challenges for staff […]. Rather than adapt to these challenges, they removed the system. […] No librarian would get rid of all the Harry Potter books because they are “too popular.” No museum would stop offering an educational program that was “too successful.” These are familiar challenges that come with the job and are seen to have benefit. But if tagging creates a line or people spend too much time giving you feedback? Staff at Haarlem Oost likely felt comfortable removing the tagging shelves because they didn’t see the tagging as a patron requirement, nor the maintenance of the shelves as part of their job.
  3. Gremlina Turing-complete, graph-based programming language developed in Java 1.6+ for key/value-pair multi-relational graphs known as property graphs. Graph structures underly a lot of interesting data (citations, social networks, maps) and this is a sign that we’re inching towards better systems for working with those graphs. (via Hacker News)
  4. Anic — programming language based on stream and latches. I still can’t figure out whether it’s an elaborate April Fool’s Day joke that was released too soon, because the claim of “easier than *sh” is a bold one given the double-backslash and double-square-bracket-heavy syntax of the language. Important because it’s built to be parallelised, and we’re in transition pain right now between well-understood predictable languages for single CPUs (with hacks like pthreads for scaling) and experimental languages for multiple CPUs.
Four short links: 24 December 2009

Four short links: 24 December 2009

Minds for Sale, Heat Death of the Web, Handheld Wireless Magazines, Joking Computers

  1. Jonathan Zittrain on “Minds for Sale” — video of a presentation he gave at the Computer History Museum about crowdsourcing. In the words of one attendee, Zittrain focuses on the potential alienation and opportunities for abuse that can arise with the growth of distributed online production. He also contemplates the thin line that separates exploitation from volunteering in the context of online communities and collaboration. Video embedded below.
  2. Anatomy of a Bad Search Result — Physicists tell us that the 2nd law of Thermodynamics predicts that eventually everything in the universe will be the same temperature, the way a hot bath in a cold room ends up being a lukewarm bath in a lukewarm room. The web is entering its own heat death as SEO scum build fake sites with stolen content from elsewhere on the web. If this continues, we won’t be able to find good content for all the bullshit. The key is to have enough dishwaster-related text to look like it’s a blog about dishwashers, while also having enough text diversity to avoid being detected by Google as duplicative or automatically generated content. So who created this fake blog? It could have been Consumersearch, or a “black hat” SEO consultant, or someone in an affiliate program that Consumersearch doesn’t even know. I’m not trying to imply that Consumersearch did anything wrong. The problem is systematic. When you have a multibillion dollar economy built around keywords and links, the ultimate “products” optimize for just that: keywords and links. The incentive to create quality content diminishes.
  3. Magplus — gorgeous prototyping for how magazines might work on new handheld devices.
  4. Glasgow’s Joking ComputerThe Glasgow Science Centre in Scotland is exhibiting a computer that makes up jokes using its database of simple language rules and a large vocabulary. It’s doing better than most 8 year old children. In fact, if we were perfectly honest, most adults can’t pun to save themselves. Q: What do you call a shout with a window? A: A computer scream. (via Physorg News)
Four short links: 11 November 2009

Four short links: 11 November 2009

Participation Tools, Open Data Requests, Go Programming Language, Why Open Source is Better

  1. ParticipateDB — database of online tools for public participation. Closed alpha now, with 32 tools and 15 projects in the database. (via Sara Winge)
  2. DataTO — like data.gov, but it’s where users request data sets. (In this case, from the Toronto municipal government)
  3. Go — new language from Bell Labs and Unix central figures Rob Pike and Ken Thompson, who now work at Google. Bits of C, bits of Google, it compiles to native binaries and runs nearly as fast as C. Built with concurrency and memory management as central figures. Not used in production at Google yet, but grew from a 20% project to something worthy of public release.
  4. On Commit Bits (Jacob Kaplan-Moss) — that day-one-commit-bit is one of the starkest differences between the corporate and the open source development model. […] Granted, Django’s very conservative when it comes to granting that commit bit, but I’m not aware of a single open source project under the sun that’d give out a commit bit on a contributor’s first day. I’ve seen developers who’ve been hired to work full time on open source work for months without commit access to the project they’re paid to develop! One of several posts that Jacob’s made about why open source makes for (on average) better software.
Four short links: 5 November 2009

Four short links: 5 November 2009

Heat Maps in R, EC2 Blackhat Tricks, Snickersome Unicode, and Decoding Statistics

  1. Heat Maps in RWe used financial data here because it’s easier to access than the airline data, but it’s actually a pretty interesting way of looking at a financial time series. Weekend and holiday effects are a bit more obvious, and it’s a bit like being able to see the daily, weekly, monthly and yearly closes all at once (by scanning your eye over the calendar in different directions). Includes source code. (via migurski on Delicious)
  2. BlackHat and EC2Theft of resources is the red-headed step-child of attack classes and doesn’t get much attention, but on cloud platforms where resources are shared amongst many users these attacks can have a very real impact. With this in mind, we wanted to show how EC2 was vulnerable to a number of resource theft attacks and the videos below demonstrate three separate attacks against EC2 that permit an attacker to boot up massive numbers of machines, steal computing time/bandwidth from other users and steal paid-for AMIs. (via straup on Delicious)
  3. Funny Characters in Unicode — I never get tired of the wacky stuff in Unicode. I love the thought of a Unicode committee somewhere arguing passionately about the number of buttons on the snowman …. (via Hacker News)
  4. Statistics to English TranslationThe terms sensitivity and specificity generally refer to diagnostic or screening procedures, such as an HIV or allergy tests. The sensitivity of a test is its true positive rate; the specificity is its true negative rate, although it can be more intuitive to think of specificity as the complement of the false positive rate. This matters. Bandying around numbers with misleading labels, or misinterpreting numbers that have a precise and defined meaning, does not further understanding. (Said 78.4% of statisticians, with a 20% confidence factor probability of false positives)