ENTRIES TAGGED "xml"
The power of a technology now taken for granted
A friend wanted to show me a great new thing in 1993, this crazy HTML browser called Cello. He knew I was working on hypertext and this seemed like just the thing for it! Sadly, my time in HyperCard and an unfortunate encounter with the HyTime specifications meant that I bounced off of it, because markup couldn’t possibly work.
I was, of course, very very wrong.
Markup with some brilliantly minimal hypertext options was about to launch the World Wide Web. The toolset was approachable, easy to apply to many kinds of information, and laid the foundation for greater things to come.
The standard for mathematical content in publishing work flows, technical writing, and math software
20 years into the web, math and science are still second class citizens on the web. While MathML is part of HTML 5, its adoption has seen ups and downs but if you look closely you can see there is more light than shadow and a great opportunity to revolutionize educational, scientific and technical communication.
Somebody once compared the first 20 years of the web to the first 100 years of the printing press. It has become my favorite perspective when thinking about web standards, the web platform and in particular browser development. 100 years after Gutenberg the novel had yet to be invented, typesetting quality was crude at best and the main products were illegally copied pamphlets. Still, the printing press had revolutionized communication and enabled social change on a massive scale.
In the near future, all our current web technology will look like Gutenberg’s original press sitting next to an offset digital printing machine.
With faster and faster release cycles it is sometimes hard to keep in mind what is important in the long run—enabling and revolutionizing human communication.
Since I joined the MathJax team in 2012, I have gained many new perspectives on MathML, the web standard for display of mathematical content, and its role in making scientific content a first class citizen on the web. But it is rather useless to talk about MathML’s potential without knowing about the state of MathML on the web. So let’s tackle that in this post.
Flow-based, functional, and more
“Small pieces loosely joined,” David Weinberger’s appealing theory of the Web, has much to say to programmers as well. It always inspires me to reduce the size of individual code components. The hard part, though, is rarely the “small” – it’s the “loose”.
After years of watching and wondering, I’m starting to see a critical mass of developers working within approaches that value loose connections. The similarities may not be obvious (or even necessarily welcome) to practitioners, but they share an approach of applying encapsulation to transformations, rather than data. Of course, as all of these are older technologies, the opportunity has been there for a long time.
The Return of Flow-Based Programming
“There’s two roles: There’s the person building componentry, who has to have experience in a particular program area, and there’s the person who puts them together,” explains Morrison. “And it’s two different skills.”
That separation of skills – programmers creating separate black box transformations and less-programmery people defining how to fit the transformations together – created a social problem for the approach. (I still hear similar complaints about the designer/programmer roles for web development.)
The map-like pictures that are drawing people to NoFlo, like this one for NoFlo Jekyll, show how the transformations connect, how the inputs and outputs are linked. I like the pictures, I’m not terribly worried that they will descend into mad spaghetti, and this division of labor makes too much sense to me. At the same time, though, it reminds me of a few other things.
This summer, I’ve seen all kinds of programming approaches as I’ve bounced between the Web, XSLT, Erlang, and XML, with visits to many other environments. As I look through the cool new possibilities for interfaces, for scaling up and down, and for dealing with data, I keep seeing two basic patterns repeating: walking trees (of data or document structure), and handling events.
Getting past the HTML / XML splits
A few years ago, I stopped talking about XML and starting talking about markup. After a few too many conversations with developers who had found XHTML, web services, and various other things that had proudly branded themselves with the “X,” it was clear from hostile responses that XML’s boom was done.
Many of Wednesday’s Balisage talks wrestled with the challenges XML tools face in a world dominated by competitors, especially HTML5. Today opened with a talk on making XForms work in HTML5 with browsers, followed immediately by a talk on replacing DocBook XML with XHTML5 (at O’Reilly). Two more abstract talks looked at filtering and an info space model, but the afternoon asked “Where did all the document kids go?” and then sought world domination by making XML invisible.
FtanML looks for the best of both
Today’s Balisage conference got off to a great start. After years of discussing the pros and cons of XML, HTML, JSON, SGML, and more, it was great to see Michael Kay (creator of the SAXON processor for XSLT and XQuery) take a fresh look at what a markup language should be.
OSCON 2013 Speaker Series
R is an open-source statistical computing environment similar to SAS and SPSS that allows for the analysis of data using various techniques like sub-setting, manipulation, visualization and modeling. There are versions that run on Windows, Mac OS X, Linux, and other Unix-compatible operating systems.
To follow along with the examples below, download and install R from your local CRAN mirror found at r-project.org. You’ll also want to place the example CSV into your Documents folder (Windows) or home directory (Mac/Linux).
After installation, open the R application. The R Console will pop-up automatically. This is where R code is processed. To begin writing code, open an editor window (File -> New Script on Windows or File -> New Document on a Mac) and type the following code into your editor:
Place your cursor anywhere on the “1+1” code line, then hit Control-R (in Windows) or Command-Return (in Mac). You’ll notice that your “1+1” code is automatically executed in the R Console. This is the easiest way to run code in R. You can also run R code by typing the code directly into your R Console, but using the editor is much easier.
If you want to refresh your R Console, click anywhere inside of it and hit Control-L (in Windows) or Command-Option-L (in Mac).
Now let’s create a Vector, the simplest possible data structure in R. A Vector is similar to a column of data inside a spreadsheet. We use the
combine function to do so:
raysVector <- c(2, 5, 1, 9, 4)
To view the contents of
raysVector, just run the line of code above. After running the code shown above, double-click on
raysVector (in the editor) and then run the code that is automatically highlighted after double-clicking. You will now see the contents of
raysVector in your R Console.
The object we just created is now stored in memory and we can see this by running the following code:
R is an interpreted language with support for procedural and object-oriented programming. Here we use the
mean statistical function to calculate the statistical mean of
Getting help on the
mean function is easy using:
We can create a simple plot of
barplot(raysVector, col = "red")
Importing CSV files is simple too:
data <- read.csv("raysData3.csv", na.strings = "")
We can subset the CSV data in many different ways. Here are two different methods that do the same thing:
data[ 1:2, 2:4 ] data[ 1:2, c("age", "weight", "height") ]
There are many ways to transform your data in R. Here’s a method that doubles everyone’s age:
dataT <- transform( data, age = age * 2 )
apply function allows us to apply a standard or custom function without loops. Here we apply the mean function
column-wise to the first 3 rows of the dataset in order to analyze the age and height columns of the dataset. We will also ignore missing values during the calculation:
apply( data[1:3, c("age", "height")], 2, mean, na.rm = T )
Here we build a linear regression model that predicts a person’s weight based on their age and height:
raysModel <- lm(weight ~ age + height, data = data)
We can plot our residuals like this:
plot(raysModel$residuals, pch = 15, col = "red")
Effectively control Windows from the console
Here’s a slick PowerShell 3.0 one-liner. If you want to pull down an RSS feed from a blog, displaying only the title and publication date try:
Invoke-RestMethod "http://www.dougfinke.com/blog/index.php/feed/" | Select title, pubdate
It’s that simple. No looping, no checking end of stream, no XSLT to handle transforming the XML from the RSS feed, but wait, there’s more. This array of objects is now connected to the entire PowerShell ecosystem. PowerShell is based on .NET so you can use ADO.NET to insert it into a database, use
Invoke-RestMethod again and post it to another REST endpoint or spin up Microsoft Excel and control it via its COM API. And that my friends, is the two foot dive into the PowerShell ocean.
PowerShell is Microsoft’s task automation framework, consisting of a command-line shell, an integrated scripting environment (ISE), a scripting language built on .NET Framework, an API allowing you to host PowerShell in your .NET applications, and it is a distributed automation platform. This means if you have PowerShell running on another box, you can remotely execute PowerShell there, if you have the credentials.
What you need to do is launch the PowerShell console. On my Windows 8 box I press the Windows button, type “
powers“, and hit enter.
Great! I’ve got a blank blue screen. Now what?
It's time for developers to create their own vocabularies
When HTML first appeared, it offered a coherent if limited vocabulary for sharing content on the newly created World Wide Web. Today, after HTML has handed off most of its actual work to other specifications, it’s time to stop worrying about this central core and let developers choose their own markup vocabularies and processing.
When the W3C first formed, it formed around HTML, the core standard of content on the Web, defining the structure, appearance, and behavior of content. Over the next few years, however, it became clear that HTML was doing too much, and the W3C and other groups refactored appearance, behavior, and many semantics into separate specifications:
Cascading Style Sheets (CSS) took responsibility for presentation and layout.
WAI-ARIA took responsibility for accessibility semantics, ensuring that content remained available to a broad audience even if developers pushed the current boundaries of markup.
Doing less and more than XML.
It’s easy to forget that XML started out as a simplification process, trimming SGML into a more manageable and more parseable specification. Once XML reached a broad audience, of course, new specifications piled on top of it to create an ever-growing stack.
That stack, despite solving many problems, brings two new issues: it’s bulky, and there are a lot of problems that even that bulk can’t solve.
Two proposals at last week’s Balisage markup conference examined different approaches to working outside of the stack, though both were clearly capable of working with that stack when appropriate.