"databases" entries

Four short links: 5 January 2016

Four short links: 5 January 2016

Inference with Privacy, RethinkDB Reliability, T-Mobile Choking Video, and Real-Time Streams

  1. Privacy-Preserving Inference of Social Relationships from Location Data (PDF) — utilizes an untrusted server and computes the building blocks to support various social relationship studies, without disclosing location information to the server and other untrusted parties. (via CCC Blog)
  2. Jepson takes on Rethink — the glowingest review I’ve seen from Aphyr. As far as I can ascertain, RethinkDB’s safety claims are accurate.
  3. T-Mobile’s BingeOn `Optimization’ Is Just Throttling (EFF) — T-Mobile has claimed that this practice isn’t really “throttling,” but we disagree. It’s clearly not “optimization,” since T-Mobile doesn’t alter the actual content of the video streams in any way.
  4. qminer — BSD-licensed data analytics platform for processing large-scale, real-time streams containing structured and unstructured data.
Four short links: 17 December 2015

Four short links: 17 December 2015

Structured Image Concepts, Google's SDN, Lightbulb DeDRMing, and EFF SF

  1. Visual Genomea data set, a knowledge base, an ongoing effort to connect structured image concepts to language.
  2. Google’s Software Defined Networking[What was the biggest risk you faced rolling out the network? …] we were breaking the fate-sharing principle—which is to say we were putting ourselves in a situation where either the controller could fail without the switch failing, or the switch could fail without the controller failing. That generally leads to big problems in distributed computing, as many people learned the hard way once remote procedure calls became a dominant paradigm.
  3. Philips Backtrack on Lightbulb DRMIn view of the sentiment expressed by our customers, we have decided to reverse the software upgrade so that lights from other brands continue to work as they did before with the Philips Hue system.
  4. Pwning Tomorrow — EFF Publishes SF Anthology. You can expect liberties and freedoms to feature.
Four short links: 8 December 2015

Four short links: 8 December 2015

Open Source ZeroDB, HTTP Statuses, Project Activity, and Database Readings

  1. ZeroDB is Open Source — end-to-end encrypted database goes open source (AGPL, *ptui*).
  2. Choosing an HTTP Status Code — or “an alternative to engineers duelling.”
  3. Open Source Monthly — views of open source projects through their GitHub activity.
  4. Readings in Database Science (5ed) — HTML and PDF versions of the papers.
Four short links: 25 November 2015

Four short links: 25 November 2015

Faking Magstripes, Embedded Database, Another Embedded Database, Multicamera Array

  1. magspoofa portable device that can spoof/emulate any magnetic stripe or credit card “wirelessly,” even on standard magstripe readers.
  2. LittleD — open source relational database for embedded devices and sensors nodes.
  3. iondb — open source key-value datastore for resource constrained systems.
  4. Stanford Multicamera Array — 128 cameras, reconfigurable. If the cameras are packed close together, then the system effectively functions as a single-center-of-projection synthetic camera, which we can configure to provide unprecedented performance along one or more imaging dimensions, such as resolution, signal-to-noise ratio, dynamic range, depth of field, frame rate, or spectral sensitivity. If the cameras are placed farther apart, then the system functions as a multiple-center-of-projection camera, and the data it captures is called a light field. Of particular interest to us are novel methods for estimating 3D scene geometry from the dense imagery captured by the array, and novel ways to construct multi-perspective panoramas from light fields, whether captured by this array or not. Finally, if the cameras are placed at an intermediate spacing, then the system functions as a single camera with a large synthetic aperture, which allows us to see through partially occluding environments like foliage or crowds.
Four short links: 18 November 2015

Four short links: 18 November 2015

Crypto Comms, Science Funding, Geo DB, and AI Ambitions

  1. If The Paris Hackers Weren’t Using Crypto, The Next Ones Will (Cory Doctorow) — But the reality is that criminals will be using crypto soon, if they aren’t already, for the same reason they’re using computers. Using crypto is the best way to communicate.
  2. Google $50M Heart Disease Effort — instead of taking bids for $250K chunks of the money, they will fund one team for five years. Applications close Feb 14.
  3. Pyro (Usenix) — This paper presents Pyro, a spatial-temporal big data storage system tailored for high-resolution geometry queries and dynamic hotspots. Pyro understands geometries internally, which allows range scans of a geometry query to be aggregately optimized. Moreover, Pyro employs a novel replica placement policy in the DFS layer that allows Pyro to split a region without losing data locality benefits.
  4. Inside Mark Zuckerberg’s Bold Plan for Facebook (FastCompany) — “One of our goals for the next five to 10 years,” Zuckerberg tells me, “is to basically get better than human level at all of the primary human senses: vision, hearing, language, general cognition.”
Four short links: 2 November 2015

Four short links: 2 November 2015

Anti-Caching, Tyranny of Ratings, Distributed Deep Learning, and Sorting Rated Things

  1. Anti-Caching (PDF) — paper outlining a clever reframing of the database strategy of keeping frequently accessed things in-memory, namely pushing to disk the things that won’t be accessed … aka, “anti-caching.”
  2. The Rating Game (Verge) — Until companies release ratings data, we can’t know for certain whether this is true, but a study of Airbnb users found that black hosts get less money for similar listings than white hosts, and another study found that white taxi drivers get higher tips than black ones. There’s no reason such biases wouldn’t carry over to ratings.
  3. Singa — Apache distributed deep learning platform turns 1.0.
  4. Scoring Items That Were Voted On or Rated — a Bayesian system to turn a set of ratings or up/down votes into a single score, such that you can sort a list from “best” to “worst.”
Four short links: 18 September 2015

Four short links: 18 September 2015

Mass Customization, Monolithic Codebase, Database Implementation, and Encrypted Databases

  1. The Wild Wild East (The Economist) — Fung Retailing Limited, a related firm, has over 3,000 outlets, a third of them in China. Victor Fung, its honorary chairman, sees the era of mass production giving way to one of mass customization. Markets are fragmenting and smartphones are empowering consumers to get “directly involved in what they buy, where it is made and how they buy it.” Zhao Xiande of CEIBS in Shanghai points to Red Collar, a firm that used simply to make and export garments. Now it lets customers the world over design their own shirts online and makes them to order. Another outfit, Home Koo, offers custom-built furniture online.
  2. Motivation for a Monolithic Codebase (YouTube) — interesting talk about Google’s codebase, the first time I know of that Google’s strategy for source code management was discussed in public.
  3. SQL in CockroachDB: Mapping Table Data to Key-Value Storage — very easy-to-follow simple database implementation lesson.
  4. cryptdbA database system that can process SQL queries over encrypted data.
Four short links: 15 September 2015

Four short links: 15 September 2015

Bot Bucks, Hadoop Database, Futurism Biases, and Tactile Prosthetics

  1. Ashley Madison’s Fembot Con (Gizmodo) — As documents from company e-mails now reveal, 80% of first purchases on Ashley Madison were a result of a man trying to contact a bot, or reading a message from one.
  2. Terrapin — Pinterest’s low-latency NoSQL replacement for HBase. See engineering blog post.
  3. Why Futurism Has a Cultural Blindspot (Nautilus) — As the psychologist George Lowenstein and colleagues have argued, in a phenomenon they termed “projection bias,” people “tend to exaggerate the degree to which their future tastes will resemble their current tastes.”
  4. Mind-Controlled Prosthetic Arm (Quartz) — The robotic arm is connected by wires that link up to the wearer’s motor cortex — the part of the brain that controls muscle movement — and sensory cortex, which identifies tactile sensations when you touch things. The wires from the motor cortex allow the wearer to control the motion of the robot arm, and pressure sensors in the arm that connect back into the sensory cortex give the wearer the sensation that they are touching something.
Four short links: 18 August 2015

Four short links: 18 August 2015

Chris Grainger Ships, Disorderly Data-Centric Languages, PCA for Fun and Fashion, and Know Thy History

  1. Eve, Version 0 (Chris Grainger) — Version 0 contains a database, compiler, query runtime, data editor, and query editor. Basically, it’s a database with an IDE. You can add data both manually or through importing a CSV and then you can create queries over that data using our visual query editor.
  2. BOOM: Berkeley Orders Of Magnitudean effort to explore implementing Cloud software using disorderly, data-centric languages.
  3. Eigenstyle — clever analysis and reconstruction of images through principal component analysis. And here are “prettiest ugly dresses,” those that I classified as dislikes, that the program predicted I would really like.
  4. Turing Digital Archivemany of Turing’s letters, talks, photographs, and unpublished papers, as well as memoirs and obituaries written about him. It contains images of the original documents that are held in the Turing collection at King’s College, Cambridge. (Timely as Jason Scott works to save a manual archive: [1], [2], [3])
Four short links: 8 July 2015

Four short links: 8 July 2015

Encrypted Databases, Product Management, Patenting Machine Learning, and Programming Ethics

  1. Zero Knowledge and Homomorphic Encryption (ZDNet) — coverage of a few startups working on providing databases that don’t need to decrypt the data they store and retrieve.
  2. How Not to Suck at Making ProductsNever confuse “category you’re in” with the “value you deliver.” Customers only care about the latter.
  3. Google Patenting Machine Learning Developments (Reddit) — I am afraid that Google has just started an arms race, which could do significant damage to academic research in machine learning. Now it’s likely that other companies using machine learning will rush to patent every research idea that was developed in part by their employees. We have all been in a prisoner’s dilemma situation, and Google just defected. Now researchers will guard their ideas much more combatively, given that it’s now fair game to patent these ideas, and big money is at stake.
  4. Machine Ethics (Nature) — machine learning ethics versus rule-driven ethics. Logic is the ideal choice for encoding machine ethics, argues Luís Moniz Pereira, a computer scientist at the Nova Laboratory for Computer Science and Informatics in Lisbon. “Logic is how we reason and come up with our ethical choices,” he says. I disagree with his premises.