"graph databases" entries

Four short links: 23 March 2016

Four short links: 23 March 2016

Graph Query, API Economy, Mutual Interest, and The Multithreading Organization

  1. Dragon: A Distributed Graph Query Engine — Facebook describes its internal graph query engine. [T]he layout of these indices on storage is optimized based on a deeper understanding of query patterns (e.g., many queries are about friends), as opposed to accepting random sharding, which is common in these systems. Wisely, the system is tailored to the use cases they have and the patterns they see in access.
  2. Almost Everyone Is Doing the API Economy Wrong (Techcrunch) — Redux: your API should help you make money when the API customer makes money, and you should set clear expectations for what’s acceptable and what’s not. But every developer should be forced to write 100 times: “if you build on a platform you don’t own, you’re building on a potential and probable future competitor.”
  3. Traditional Economics Failed, Here’s a Blueprint — runs through the shifts happening in our thinking about the world and ourselves (simple to complex, independent to interdependent, rational calculator to irrational approximators, etc) and concludes: True self-interest is mutual interest. The best way to improve your likelihood of surviving and thriving is to make sure those around you survive and thrive. See above API note.
  4. Blitzscaling (HBR) — as you move from village to city, functions are beginning to be differentiated; you’re really multithreading. I could write a thesis on the CAP theorem for business. And I have definitely worked for companies that have a “share nothing” approach to solving their threading issues.
Four short links: 31 December 2015

Four short links: 31 December 2015

Reverse Engineering Playground, Feeding Graph Databases, Lessig, and Fantasies of Immortality

  1. crackmes.de — practice playground for reverse engineering and breaking protections.
  2. Feeding Graph Databases — exploring using logging systems to feed graph databases.
  3. Lessig Interview (WSJ) — the slogan says regulation should be more technology neutral. I am not sure I ever heard a more idiotic statement in my life. There is no neutrality here, just different modes. … I don’t what think the law should say here is what services can do and not do, because the technology is so (fast-changing) the law could never catch up. But that what (we want) to avoid are certain kinds of business models, a prison of bits, where services leverage control over access to content and profit from that control over content.
  4. Bubble-Driven PseudoscienceIn terms of life extension, here are the real opportunities: closing the gap between black and white patients, lowering the infant mortality rate, and making sure the very poorest among us have access to adequate care. You can make sure that many people live longer, right now! But none of this is quite as sexy as living forever, even though it’s got a greater payoff for the nation as a whole. So instead of investing in these areas, you’ve got a bunch of old white men who are afraid to die trying to figure out cryonics.
Four short links: 15 December 2015

Four short links: 15 December 2015

Barbie Broken, JSON Database, Lightbulb DRM, and Graph Database

  1. Crypto is Hard says Hello BarbieWe discovered several issues with the Hello Barbie app including: it utilizes an authentication credential that can be re-used by attackers; it connects a mobile device to any unsecured Wi-Fi network if it has “Barbie” in the name; it shipped with unused code that serves no function but increases the overall attack surface. On the server side, we also discovered: client certificate authentication credentials can be used outside of the app by attackers to probe any of the Hello Barbie cloud servers; the ToyTalk server domain was on a cloud infrastructure susceptible to the POODLE attack. (via Ars Technica)
  2. Kinto — Mozilla’s open source lightweight JSON storage service with synchronisation and sharing abilities. It is meant to be easy to use and easy to self-host.
  3. Philips Blocks 3rd Party Lightbulbs — DRM for light fixtures. cf @internetofsh*t
  4. gaffer — GCHQ-released open source graph database. …a framework that makes it easy to store large-scale graphs in which the nodes and edges have statistics such as counts, histograms, and sketches. These statistics summarise the properties of the nodes and edges over time windows, and they can be dynamically updated over time. Gaffer is a graph database, rather than a graph processing system. It is optimised for retrieving data on nodes of interest. IHNJH,IJLTS “nodes of interest.”

Graph databases are powering mission-critical applications

The O’Reilly Data Show Podcast: Emil Eifrem on popular applications of graph technologies, cloud computing, and company culture.

Subscribe to the O’Reilly Data Show Podcast to explore the opportunities and techniques driving big data and data science.

350px-VASR_7_map_svg

While most people associate graphs with social media analysis, there are a wide range of applications — including recommendations, fraud detection, I.T. operations, and security — that are routinely framed using graphs. This wide variety of use cases has led to rise to many interesting tools for storing, managing, visualizing, and analyzing massive graphs. The important thing to note is that graph databases are not limited to reporting and analytics, but are also being used to power mission critical applications.

In this episode of the O’Reilly Data Show, I sat down with Emil Eifrem, CEO and co-founder of Neo Technology. We talked about the early days of NoSQL, applications of graph databases, cloud computing, and company culture in the U.S. and Sweden.

Graph and NoSQL databases

The relational database had been an accelerator, and here it’s really slowing us down. What we ended up concluding was that the problem was this mismatch between the shape of the data and the abstractions that were exposed by our infrastructure. At that point, we said, okay, what if we had a database that just exposed these amazing network-oriented data structures or graph-oriented data structures, but other than that, had all the properties of a relational database. Wouldn’t that be great? …  Ultimately, we said the famous last words: ‘Hey, let’s just build it ourselves. How hard can it be?’ It turns out it’s 15 years later!

2007 is when both the Dynamo paper had been published and the BigTable paper had been published out of Amazon and Google, respectively. That’s when, in early adopter circuits, the discourse started to change … maybe the era of the one-size-fits-all database is over. Maybe our job isn’t to take all of our data and shove it through a relational database. Maybe there are some other tools and technologies and abstractions out there that make better sense for some data. That was in ’07.  I really think it was as if lightning struck in the community. … . [Dynamo and BigTable were announced] and the next day, 12 open source projects, implementing it, and then the next day, 24 new ones. It was just crazy back then.

Read more…

Four short links: 23 October 2015

Four short links: 23 October 2015

Data Science, Temporal Graph, Biomedical Superstars, and VR Primer

  1. 50 Years of Data Science (PDF) — Because all of science itself will soon become data that can be mined, the imminent revolution in Data Science is not about mere “scaling up,” but instead the emergence of scientific studies of data analysis science-wide.
  2. badwolfa temporal graph store from Google.
  3. Why Biomedical Superstars are Signing on with Google (Nature) — “To go all the way from foundational first principles to execution of vision was the initial draw, and that’s what has continued to keep me here.” Research to retail, at Google scale.
  4. VR Basics — intro to terminology and hardware in the next gen of hardware, in case you’re late to the goldrush^w exciting field.

Building applications in Azure

Identifying the key requirements of a web application cloud architecture.

Download a free copy of “Azure for Developers,” an O’Reilly report by experienced .NET developer John Adams that breaks down Microsoft’s Azure platform in plain language, so that you can quickly get up to speed.

One of the most natural uses of the cloud is for web applications. You may already be using virtual machines on your own systems to make deploying your applications easier, either to new hardware or to additional servers. Microsoft Azure uses virtualization too, but it also brings useful benefits that virtualization cannot deliver alone. By hosting your application in the cloud, you can leverage automatic scaling, load balancing, system health monitoring, and logging. You also benefit from the fact that managed cloud platforms help narrow the attack surface of your system by automatically patching the operating system and runtimes and by keeping systems sandboxed. Let’s look at some examples of how to build some common web applications inside of Microsoft Azure.

Online store

Imagine that you work for a retailer who generates a significant amount of revenue through online sales. Imagine also that this retailer has been around for long enough that it already has an established web architecture that runs in a private data center. This retailer has decided that it wants to move to a hosted platform so that it no longer has any data center responsibilities and it can focus on its core business. How do you replatform this web application into Microsoft Azure? Let’s first identify some requirements for this system:

  • It has high utilization and needs to serve a large number of concurrent users without timing out, even during peak hours such as Black Friday sales.
  • It needs to accommodate a wide variety of products in its database that do not necessarily all follow the same schema.
  • It needs a fast and intelligent search bar so that customers can find products easily.
  • It needs to be able to recommend products to customers as they shop to help generate additional revenue.

However these requirements are being met today in the private data center, I can suggest some guidelines on how to reproduce this system in Microsoft Azure so you can boost performance instead of just replicating it. I will take each of these requirements in order and explain how to leverage certain Azure components so that these requirements are properly met.

Read more…

Four short links: 27 May 2015

Four short links: 27 May 2015

Domo Arigato Mr Google, Distributed Graph Processing, Experiencing Ethics, and Deep Learning Robots

  1. Roboto — Google’s signature font is open sourced (Apache 2.0), including the toolchain to build it.
  2. Pregel: A System for Large Scale Graph Processing — a walk through a key 2010 paper from Google, on the distributed graph system that is the inspiration for Apache Giraph and which sits under PageRank.
  3. How to Turn a Liberal Hipster into a Global Capitalist (The Guardian) — In Zoe Svendsen’s play “World Factory at the Young Vic,” the audience becomes the cast. Sixteen teams sit around factory desks playing out a carefully constructed game that requires you to run a clothing factory in China. How to deal with a troublemaker? How to dupe the buyers from ethical retail brands? What to do about the ever-present problem of clients that do not pay? […] And because the theatre captures data on every choice by every team, for every performance, I know we were not alone. The aggregated flowchart reveals that every audience, on every night, veers toward money and away from ethics. I’m a firm believer that games can give you visceral experience, not merely intellectual knowledge, of an activity. Interesting to see it applied so effectively to business.
  4. End to End Training of Deep Visuomotor Policies (PDF) — paper on using deep learning to teach robots how to manipulate objects, by example.
Four short links: 12 March 2015

Four short links: 12 March 2015

Billion Node Graphs, Asynchronous Systems, Deep Learning Hardware, and Vision Resources

  1. Mining Billion Node Graphs: Patterns and Scalable Algorithms (PDF) — slides from a CMU academic’s talk at C-BIG 2012.
  2. There Is No NowOne of the most important results in the theory of distributed systems is an impossibility result, showing one of the limits of the ability to build systems that work in a world where things can fail. This is generally referred to as the FLP result, named for its authors, Fischer, Lynch, and Paterson. Their work, which won the 2001 Dijkstra Prize for the most influential paper in distributed computing, showed conclusively that some computational problems that are achievable in a “synchronous” model in which hosts have identical or shared clocks are impossible under a weaker, asynchronous system model.
  3. Deep Learning Hardware GuideOne of the worst things you can do when building a deep learning system is to waste money on hardware that is unnecessary. Here I will guide you step by step through the hardware you will need for a cheap high performance system.
  4. Awesome Computer Vision — curated list of computer vision resources.
Four short links: 15 January 2015

Four short links: 15 January 2015

Secure Docker Deployment, Devops Identity, Graph Processing, and Hadoop Alternative

  1. Docker Secure Deployment Guidelinesdeployment checklist for securely deploying Docker.
  2. The Devops Identity Crisis (Baron Schwartz) — I saw one framework-retailing bozo saying that devops was the art of ensuring there were no flaws in software. I didn’t know whether to cry or keep firing until the gun clicked.
  3. Apache Giraphan iterative graph processing system built for high scalability. For example, it is currently used at Facebook to analyze the social graph formed by users and their connections.
  4. Apache Flinka data processing system and an alternative to Hadoop’s MapReduce component. It comes with its own runtime, rather than building on top of MapReduce. As such, it can work completely independently of the Hadoop ecosystem. However, Flink can also access Hadoop’s distributed file system (HDFS) to read and write data, and Hadoop’s next-generation resource manager (YARN) to provision cluster resources. Since most Flink users are using Hadoop HDFS to store their data, we ship already the required libraries to access HDFS.
Four short links: 12 November 2014

Four short links: 12 November 2014

Material Design, Inflatable Robots, Printable Awesome, and Graph Modelling

  1. CSS and React to Implement Material Design — as I said earlier, it will be interesting to see if Material Design becomes a common UI style for the web.
  2. Current State of Inflatable Robots — I’d missed the amazing steps forward in control that were made in pneumatic robots. Check out the OtherLab tentacle!
  3. Dinosaur Skull Showerhead — 3D-printable add-on to your shower. (via Archie McPhee)
  4. Data Modelling in Graph Databases — how to build the graph structure by working back from the questions you’ll ask of it.