"social software" entries

Four short links: 19 April 2016

Four short links: 19 April 2016

Security Controls, Dataflow Checkups, Fair Use Wins, and Internet Moderators

  1. Security Controls for Computer Systems — Declassified 1970s DoD security document is still relevant today. (via Ars Technica)
  2. Checking Up on Dataflow Analyses — notable for a very easy-to-follow introduction to what dataflow analysis is. Long after the chatbot startups have flamed out, formal methods research in CS will be a key part of the next wave of software where code writes code.
  3. Fair Use Triumphs in Supreme Court (Ars Technica) — a headline I never thought I’d see in my lifetime. The Supreme Court let stand the lower court opinion that rejected the writers’ claims. That decision today means Google Books won’t have to close up shop or ask book publishers for permission to scan. In the long run, the ruling could inspire other large-scale digitization projects.
  4. The Secret History of Internet Moderators (The Verge) — the horrors and trauma of the early folks who developed content moderation systems (filtering violence, porn, child abuse, etc.) for Facebook, YouTube, and other user-contributed-content sites. It’s still a quiet and under-supported area of most startups. Some of them now meet roughly monthly for dinner, and I’m kinda glad I’m not around the table for that conversation!

(more…)

Four short links: 16 March 2016

Four short links: 16 March 2016

Analytic Monitoring, Commenter Demographics, Math and Empathy, and How We Read

  1. MacroBaseAnalytic monitoring for the Internet of Things. The code behind a research paper, written up in the morning paper where Adrian Colyer says, there is another story that also unfolds in the paper – one of careful system design based on analysis of properties of the problem space, of thinking deeply and taking the time to understand the prior art (aka “the literature”), and then building on those discoveries to advance and adapt them to the new situation. “That’s what research is all about!” you may say, but it’s also what we’d (I’d?) love to see more of in practitioner settings, too. The result of all this hard work is a system that comprises just 7,000 lines of code, and I’m sure, many, many hours of thinking!
  2. Survey of Commenters and Comment ReadersAmericans who leave news comments, who read news comments, and who do neither are demographically distinct. News commenters are more male, have lower levels of education, and have lower incomes compared to those who read news comments. (via Marginal Revolution)
  3. The Empathizing-Systemizing Theory, Social Abilities, and Mathematical Achievement in Children (Nature) — systematic thinking doesn’t predict math ability in children, but being empathetic predicts being worse at math. The effect is stronger with girls. The authors propose the mechanism is that empathetic children pick up a teacher’s own dislike of math, and any teacher biases like “girls aren’t good at math.”
  4. Moneyball for Book Publishers: A Detailed Look at How We Read (NYT) — On average, fewer than half of the books tested were finished by a majority of readers. Most readers typically give up on a book in the early chapters. Women tend to quit after 50 to 100 pages, men after 30 to 50. Only 5% of the books Jellybooks tested were completed by more than 75% of readers. Sixty percent of books fell into a range where 25% to 50% of test readers finished them. Business books have surprisingly low completion rates. Not surprisingly low to anyone who has ever read a business book. They’re always a 20-page idea stretched to 150 pages because that’s how wide a book’s spine has to be to visible on the airport bookshelf. Fat paper stock and 14-point text with wide margins and 1.5 line spacing help, too. Don’t forget to leave pages after each chapter for the reader’s notes. And summary checklists. And … sorry, I need to take a moment.
Four short links: 10 March 2016

Four short links: 10 March 2016

Cognitivist and Behaviourist AI, Math and Social Computing, A/B Testing Stats, and Rat Cyborgs are Smarter

  1. Crossword-Solving Neural NetworksHill describes recent progress in learning-based AI systems in terms of behaviourism and cognitivism: two movements in psychology that effect how one views learning and education. Behaviourism, as the name implies, looks at behaviour without looking at what the brain and neurons are doing, while cognitivism looks at the mental processes that underlie behaviour. Deep learning systems like the one built by Hill and his colleagues reflect a cognitivist approach, but for a system to have something approaching human intelligence, it would have to have a little of both. “Our system can’t go too far beyond the dictionary data on which it was trained, but the ways in which it can are interesting, and make it a surprisingly robust question and answer system – and quite good at solving crossword puzzles,” said Hill. While it was not built with the purpose of solving crossword puzzles, the researchers found that it actually performed better than commercially-available products that are specifically engineered for the task.
  2. Mathematical Foundations for Social Computing (PDF) — collection of pointers to existing research in social computing and some open challenges for work to be done. Consider situations where a highly structured decision must be made. Some examples are making budgets, assigning water resources, and setting tax rates. […] One promising candidate is “Knapsack Voting.” […] This captures most budgeting processes — the set of chosen budget items must fit under a spending limit, while maximizing societal value. Goel et al. prove that asking users to compare projects in terms of “value for money” or asking them to choose an entire budget results in provably better properties than using the more traditional approaches of approval or rank-choice voting.
  3. Power, Minimal Detectable Effect, and Bucket Size Estimation in A/B Tests (Twitter) — This post describes how Twitter’s A/B testing framework, DDG, addresses one of the most common questions we hear from experimenters, product managers, and engineers: how many users do we need to sample in order to run an informative experiment?
  4. Intelligence-Augmented Rat Cyborgs in Maze Solving (PLoS) — We compare the performance of maze solving by computer, by individual rats, and by computer-aided rats (i.e. rat cyborgs). They were asked to find their way from a constant entrance to a constant exit in 14 diverse mazes. Performance of maze solving was measured by steps, coverage rates, and time spent. The experimental results with six rats and their intelligence-augmented rat cyborgs show that rat cyborgs have the best performance in escaping from mazes. These results provide a proof-of-principle demonstration for cyborg intelligence. In addition, our novel cyborg intelligent system (rat cyborg) has great potential in various applications, such as search and rescue in complex terrains.
Four short links: 18 January 2016

Four short links: 18 January 2016

Machine Learning Technical Debt, Audio Matching, Self-Tracking Research, and Baidu's Open Source Deep Learning Code

  1. Hidden Technical Debt in Machine Learning Systems (PDF) — We explore several ML-specific risk factors to account for in system design. These include boundary erosion, entanglement, hidden feedback loops, undeclared consumers, data dependencies, configuration issues, changes in the external world, and a variety of system-level anti-patterns.
  2. Large-Scale Content-Based Matching of Midi and Audio FilesWe present a system that can efficiently match and align MIDI files to entries in a large corpus of audio content based solely on content, i.e., without using any metadata.
  3. Critical Social Research on Self-TrackingI am currently working on an article that is a comprehensive review of both literatures, in the attempt to outline what each can contribute to understanding self-tracking as an ethos and a practice, and its wider sociocultural implications. Here is a reading list of the work from critical social researchers that I am aware of. Trigger warning: phrases like “The discursive construction of student subjectivities.”
  4. Warp-CTC — Baidu’s open source deep learning code. Connectionist Temporal Classification is a loss function useful for performing supervised learning on sequence data, without needing an alignment between input data and labels.
Four short links: 4 November 2015

Four short links: 4 November 2015

Data Dashboard, Feature Flags, Email Replies, and Invisible Bias

  1. re:dash — open source query editor, visualisations, dashboard for data from all sorts of databases (SQL, ElasticSearch, etc.)
  2. Feature-Flag-Driven Development — one of the key pieces of modern development systems.
  3. Gmail Suggesting RepliesIn developing Smart Reply, we adhered to the same rigorous user privacy standards we’ve always held — in other words, no humans reading your email. This means researchers have to get machine learning to work on a data set that they themselves cannot read, which is a little like trying to solve a puzzle while blindfolded — but a challenge makes it more interesting!
  4. The Selective Laziness of ReasoningAmong those participants who accepted the manipulation and thus thought they were evaluating someone else’s argument, more than half (56% and 58%) rejected the arguments that were in fact their own. Moreover, participants were more likely to reject their own arguments for invalid than for valid answers. This demonstrates that people are more critical of other people’s arguments than of their own, without being overly critical: They are better able to tell valid from invalid arguments when the arguments are someone else’s rather than their own.
Four short links: 2 November 2015

Four short links: 2 November 2015

Anti-Caching, Tyranny of Ratings, Distributed Deep Learning, and Sorting Rated Things

  1. Anti-Caching (PDF) — paper outlining a clever reframing of the database strategy of keeping frequently accessed things in-memory, namely pushing to disk the things that won’t be accessed … aka, “anti-caching.”
  2. The Rating Game (Verge) — Until companies release ratings data, we can’t know for certain whether this is true, but a study of Airbnb users found that black hosts get less money for similar listings than white hosts, and another study found that white taxi drivers get higher tips than black ones. There’s no reason such biases wouldn’t carry over to ratings.
  3. Singa — Apache distributed deep learning platform turns 1.0.
  4. Scoring Items That Were Voted On or Rated — a Bayesian system to turn a set of ratings or up/down votes into a single score, such that you can sort a list from “best” to “worst.”
Four short links: 9 October 2015

Four short links: 9 October 2015

Page Loads, Data Engines, Small Groups, and Political Misperception

  1. Ludicrously Fast Page Loads: A Guide for Full-Stack Devs (Nate Berkopec) — steps slowly through the steps of page loading using Chrome Developer Tools’ timeline. Very easy to follow.
  2. Specialised and Hybrid Data Management and Processing Engines (Ben Lorica) — wrap-up of data engines uncovered at Strata + Hadoop World NYC 2015.
  3. Power of Small Groups (Matt Webb) — Matt’s joined a small Slack community of like-minded friends. There’s a space where articles written or edited by members automatically show up. I like that. I caught myself thinking: it’d be nice to have Last.FM here, too, and Dopplr. Nothing that requires much effort. Let’s also pull in Instagram. Automatic stuff so I can see what people are doing, and people can see what I’m doing. Just for this group. Back to those original intentions. Ambient awareness, togetherness. cf Clay Shirky’s situated software. Everything useful from 2004 will be rebuilt once the fetish for scale passes.
  4. Asymmetric Misperceptions (PDF) — research into the systematic mismatch between how politicians think their constituents feel on issues, and how the constituents actually feel. Our findings underscore doubts that policymakers perceive opinion accurately: politicians maintain systematic misperceptions about constituents’ views, typically erring by over 10 percentage points, and entire groups of politicians maintain even more severe collective misperceptions. A second, post-election survey finds the electoral process fails to ameliorate these misperceptions.
Four short links: 1 September 2015

Four short links: 1 September 2015

People Detection, Ratings Patterns, Inspection Bias, and Cloud Filesystem

  1. End-to-End People Detection in Crowded Scenes — research paper and code. When parsing the title, bind “end-to-end” to “scenes” not “people”.
  2. Statistical Patterns in Movie Ratings (PLOSone) — We find that the distribution of votes presents scale-free behavior over several orders of magnitude, with an exponent very close to 3/2, with exponential cutoff. It is remarkable that this pattern emerges independently of movie attributes such as average rating, age and genre, with the exception of a few genres and of high-budget films.
  3. The Inspection Bias is EverywhereIn 1991, Scott Feld presented the “friendship paradox”: the observation that most people have fewer friends than their friends have. He studied real-life friends, but the same effect appears in online networks: if you choose a random Facebook user, and then choose one of their friends at random, the chance is about 80% that the friend has more friends. The friendship paradox is a form of the inspection paradox. When you choose a random user, every user is equally likely. But when you choose one of their friends, you are more likely to choose someone with a lot of friends. Specifically, someone with x friends is overrepresented by a factor of x.
  4. s3qla file system that stores all its data online using storage services like Google Storage, Amazon S3, or OpenStack. S3QL effectively provides a hard disk of dynamic, infinite capacity that can be accessed from any computer with internet access running Linux, FreeBSD or OS-X. (GPLv3)
Four short links: 23 July 2015

Four short links: 23 July 2015

Open Source, State of DevOps, History of Links, and Vote Rings

  1. The Future of Open Source (Allison Randal) — Inexperienced companies can cause a great deal of harm as they blunder around blindly in a collaborative project, throwing resources in ways that ultimately benefit no one, not even themselves. It is in our best interest as a community to actively engage with companies and teach them how to participate effectively, how to succeed at free software and open source. Their success feeds the success of free software and open source, which feeds the self-reinforcing cycle of accelerating software innovation.
  2. Puppet Labs’ State of DevOps Report (PDF) — Westrum’s model gives us the language to define and measure culture. Perhaps most interesting, Westrum’s model also predicts IT performance. This shows that information flow isn’t just essential to safety, it’s also a critical success factor for rapidly building and evolving resilient systems at scale.
  3. Beyond Conversation — tracing the history of the link from Memex to Web.
  4. Detecting Vote Rings in Product Hunt — worth implementing in every system that processes votes. Who are the jerks in a circle?
Four short links: 22 June 2015

Four short links: 22 June 2015

Power Analysis, Data at Scale, Open Source Fail, and Closing the Virtuous Loop

  1. Power Analysis of a Typical Psychology Experiment (Tom Stafford) — What this means is that if you don’t have a large effect, studies with between groups analysis and an n of less than 60 aren’t worth running. Even if you are studying a real phenomenon you aren’t using a statistical lens with enough sensitivity to be able to tell. You’ll get to the end and won’t know if the phenomenon you are looking for isn’t real or if you just got unlucky with who you tested.
  2. The Future of Data at ScaleData curation, on the other hand, is “the 800-pound gorilla in the corner,” says Stonebraker. “You can solve your volume problem with money. You can solve your velocity problem with money. Curation is just plain hard.” The traditional solution of extract, transform, and load (ETL) works for 10, 20, or 30 data sources, he says, but it doesn’t work for 500. To curate data at scale, you need automation and a human domain expert.
  3. Why Are We Still Explaining? (Stephen Walli) — Within 24 hours we received our first righteous patch. A simple 15-line change that provided a 10% boost in Just-in-Time compiler performance. And we politely thanked the contributor and explained we weren’t accepting changes yet. Another 24 hours and we received the first solid bug fix. It was golden. It included additional tests for the test suite to prove it was fixed. And we politely thanked the contributor and explained we weren’t accepting changes yet. And that was the last thing that was ever contributed.
  4. Blood Donors in Sweden Get a Text Message When Their Blood Helps Someone (Independent) — great idea to close the feedback loop. If you want to get more virtuous behaviour, make it a relationship and not a transaction. And if a warm feeling is all you have to offer in return, then offer it!