I gave the keynote today at WebWise 2013, and I have to say, after a long week at SXSWedu, I was pretty happy to be able to be around a bunch of librarians and archivists. The theme of this year's WebWise was "Putting the Learner at the Center," and my talk echoed something I've been pushing a lot lately: this question of who owns educational data. I was particularly eager to raise this question in front of this particular crowd, as I believe that IMLS-ish folks will be key in helping answer it. (No pressure, guys!)
Below is a rough transcript of my talk, along with a Storify of some of the tweets and a copy of my slides.
Whose Learning Is It Anyway?
Some of you might recognize the title of this talk as a nod to the TV program “Whose Line Is It Anyway?” — a show that ran for 8 seasons on ABC and which is apparently coming back to cable after a 10 year hiatus. Bonus points if this title conjures the British version of the show rather than the Drew Carey hosted one. Double bonus points if you think of the radio show that predated both.
“Whose Line Is It Anyway” was/is a comedy show — a game show, but only sort of — where the contestants had to compete in various challenges that tested their improvisational skills (as well sometimes as their skills in singing and impressions).
Improvisation is a particularly interesting form of comedy — an incredibly challenging but rewarding form of theater.
With improv, you must hold in your head as many cultural, historical, and literary references as you can. These must be quickly and readily accessible. Characters, themes, situations, voices, postures, gestures. Performers must be able to recall, remix, collaborate, innovate, pivot, and hopefully make the audience laugh.
Now this certainly sounds like a slew of tech-industry-related buzzwords doesn’t it – remix, collaborate, innovate, pivot — and as such, I’m sure there’s someone who might hear that and think “wow, let’s disrupt improv!” — this is a job for a database or an app: index it all, access it in real-time to maximize humor!
But can a computer do improv?
The Turing Test – the test to see if a machine is “intelligent” enough to fool a human – doesn’t necessarily help us here.
IBM’s AI machine Watson did appear on another game show after all — although on Jeopardy, not on Whose Line Is It Anyway?
Using far less sophisticated technology — as in something I can program — are Twitter bots, like the incredibly popular @horse_ebooks, that string together random phrases that look a bit like improv… And sometimes are funny. But unintentionally so.
I would contend there’s a difference here between the programmatic and the improvisational.
I’m currently working on a book on artificial intelligence and education technology — our decades long quest to build teaching machines — so I have been thinking a lot about these things lately: about how our increasing use of AI — a field that relies a great deal on machine learning — might shape what we think about human learning.
The idea for the book came to me when I was in one of Google’s self-driving cars, along with one of the car’s inventors, Sebastian Thrun. He explained to me as we zipped along interstate 280 all the cameras and sensors the car possessed — internally and externally — all the mapping data and all the traffic data that Google had amassed — how all of this going into building a car that does not need a human to steer it, to press on the brakes or the accelerator. In the future of self-driving cars, Thrun said, cars will move along the highway much more efficiently.
Now I confess, as someone who doesn’t drive and who recently moved to LA, I was thrilled with the idea of the robot cars.
But then I thought about Sebastian Thrun’s latest endeavors — the massive online startup Udacity — and I balked. “Wait, no!” I'm not too keen on the notion of automating education for the sake of efficiency.
I did wonder if it was simply me that was construing the self-driving car as a metaphor for education technology, or if this really was the model that the artificial intelligence used to think about our “learning journeys” if you will.
It’s worth pointing out that the three major MOOC initiatives — Udacity, Coursera, and edX — all have their origins in the AI lab. Daphne Koller, Andrew Ng, Sebastian Thrun, and are all AI professors at Stanford; Ng took over the head of Stanford’s AI lab when Thrun stepped down. Anant Agarwal, the head of edX, was the former head of MIT’s AI lab and a developer of exascale computing technology.
As such, these MOOC endeavors could be read as part of the long-running efforts on the part of AI researchers to develop automated teaching machines and intelligent tutoring systems.
If we just have enough data — from content to assessment data and sure, from the tens of thousands of students in massive online courses and all their keyboard and mouse clicks — we might be able to build algorithms and models that are “personalized” and “adaptive.”
But can that system ever really look like improv? Can it look like open inquiry? Can it look like self-driven learning?
If a tree falls in the road in front of a self-driving car, the car shuts down. It doesn’t go around it. It doesn’t take a different route. It stops. The self-driving car cannot handle that sort of serendipitous event.
And here we move to the heart of the matter — from “whose line is it anyway?” to “whose learning is it?” And let’s start with the data — because certainly there are many systems — robot teachers and robot graders and adaptive apps and quizzes — being built on top of it.
Who owns the learning? Who owns student data? Who owns our education data after we’re out of school? Who owns learners’ data across the variety of institutions — formal and informal — where we continue to learn throughout our lives?
I posed that question on Twitter a week or so ago. Do students own it? Schools? The government? Software providers?
The answers were varied — some people insisted that education data belongs to the student; others insisted that it belongs to however collects it. The discrepancies, to a certain extent, no doubt reflect the different levels of awareness about and definitions of education data. What counts as education data – and I certainly don’t think it just means student test scores and student ID numbers.
And honestly, it’s probably not too hard to argue that our lack of a strong stance or understanding on this topic goes for all our digital data: who’s collecting it, to what end, under what legal protections or restrictions.
These questions aren’t entirely new, but our increasing use of technologies is creating lots of new data — and lots more data — some 2.5 quintillion bytes of data created every day according to IBM — and we are facing numerous challenges and opportunities as a society over what it means to control and access and — in our case here, I’d imagine — learn from it.
Yet the question of ownership of education data remains largely – and troublingly – unresolved.
A personal anecdote: a couple of years ago, my mum gave me a large manilla envelope full of my old schoolwork — drawings and writings and photos from as far back as preschool — some projects I remembered making, many I didn’t. Mostly the envelope contained administrative records — my report cards, various certificates of accomplishment, some ribbons.
That envelope was obviously a low-tech way to collect my school records. It is certainly my mother’s curation of “what counts” as my education data – as such, a reflection of proud parenting and of schooling in a pre-digital age, I suppose. Nonetheless I think the manilla envelope makes for an interesting metaphor — a model to think about storing education data, one with strengths and weaknesses and strange relevancies for our thinking about the digital documentation and storage of education data today.
What happens now that our schoolwork is increasingly “born digital”? Is there a virtualized equivalent to my mum’s envelope?
Or — and this is what I often fear — are we creating education-related content in apps, on websites, in learning management systems that we will only have temporary access to?
Once we put our content in, can we get our content out again — and out in a format that’s actually readable, by humans and by machines?
Another anecdote: last summer, I met a young girl whose school was piloting a one-to-one iPad program. This girl’s family weren’t particularly tech-oriented. They didn’t have a computer at home. So when the school offered them, at the beginning of the year, a chance to buy the iPad, they declined. It was expensive. They didn’t see the point. But by the end of the school year, their minds had changed — one of those stories that sounds at first glance like a PR win for Apple — the device was easy to use, the girl loved it, she’d downloaded some other apps, she’d created a lot of drawings and written a lot of stories with it. And so the family approached the school about buying the iPad. But it was too late, the school said. The purchasing opportunity was a limited one at the beginning of the year. And the iPad was returned — with all this girl’s data on it. There was no manilla envelope — physical or digital — for much of her 6th grade schoolwork.
The family had no home iTunes account with which to sync the student’s data — that’s what you’re “supposed” to do to get your data off an iPad. But even more troubling, schools tend to create “dummy” accounts for these devices. So even if you did have your own iTunes account at home, it wouldn’t matter. Your school device is registered to "[email protected]."
We need to think about whether that’s the place we want to store all our data — all our kids’ data.
This isn’t just about Apple, of course. We must ask: Is there a safe digital place – any safe place– where we can store our school work and our school records — not just for the duration of a course or for the length of school year, but for “posterity”?
“Posterity” — why, that word sounds a lot like “Posterous,” doesn’t it. “Posterous” — the microblogging platform that was acquired by Twitter last year and that announced a couple of weeks ago that it would be shutting down at the end of April. Posterous — a free tool that many students and educators and librarians (among others) were using for sharing and storing writing, photos, video, and other digital content.
The impending closure of Posterous is hardly the first or the only time something like this has happened to a tool — free or paid – that’s been popular for educational purposes. Heck, we often demand students put their school work into a learning management system where they lose access to it at the end of the semester. And for that pleasure, schools spend hundreds of thousands of dollars.
So a shout-out here to the University of Mary Washington and its “Domain of One’s Own” initiative that gives domains, Web hosting, and technical training to faculty and students. As the name of the initiative suggests, students own their space on the Web. They own their own domain. They control it as students, they’re encouraged to use it as an electronic portfolio, and here’s the crucial point — they can take it with them when they graduate.
The impending demise of Posterous prompts us to ask — yet again: Are we storing our digital education content in a place that we actually control, that we actually own?
Do we — can we — own our education data? Whose data is it — whose learning is it?
While my manilla envelope might appear to offer better control over my educational content but that’s not necessarily the case.
The papers that I have in my possession are, in many instances, just a transcript. A copy. My schools retain the originals. Or I guess they do. Some of these report cards are decades old. Regardless — anyone who’s had to send money to their alma mater in order to request an official copy of their transcript has probably cursed this particular arrangement. “I earned those grades, dammit.” Give me my data.
The school retains my data, although according to FERPA (the Family Educational Rights and Privacy Act, the law that in the U.S. governs the privacy of educational records) that data is mine to review and correct. And according to FERPA, I have some say over who it’s shared with. As do, my parents, until I turn 18.
But despite its claims to protect the privacy of students’ records, nowhere does FERPA say that a student actually “owns” her or his data. Nowhere does it say that a school does either. At best, it would seem, the education institution is a steward for the “official education record” — responsible for its storage, its security, and its protection. And truth be told, the terms of “ownership” are mostly spelled out between individual schools and the databases and software they buy or license.
So let’s push this question further: what data exactly are schools and other education-related institutions stewards for? Just what’s on the transcript — that is, dates of attendance, major, and final course grades?
What about behavior records? Test scores? Individual assignments?
What about library check-outs? Gym visits? Sports records? Cafeteria and bookstore purchases? Minutes from student meetings? Times in and out of the dormitory?
What about all the data that is being collected on and generated by students? (I should clarify again here that when I say “data,” I don’t just mean numbers. Essays and photos and videos are data too.)
What about students’ search engine history? Learning management system log-ins and duration of their LMS sessions? Blog and forum comment history? Internet usage while on campus? Emails sent and received? Social media profiles? Pages read in digital textbooks? Videos watched on Coursera or Khan Academy or Udacity, along with if and where they paused it? Exercises completed on any of these platforms? Wikipedia visits. Wikipedia edits. Levels on Angry Birds? Keystrokes and mouse clicks logged?
(That last item is, along with biometric data, how Coursera says it plans to confirm students’ identities.)
Do students own this data? Do they control any of it? Can they access it? Download it? Review it?
And here’s a very important question: Are students even aware that this data is being collected?
And: Are they asked for their consent before it’s shared?
Now, none of this sort of data is included in the manilla envelope my mom gave me, quite obviously. But I think it’s worth asking if a digital version of that envelope — whatever long-term storage unit we devise for students’ education data — should include these sorts of things.
Now, I can’t deny: much of my interest in the manilla envelope my mom saved was simply nostalgic. There were good memories and bad memories and forgotten memories from my schooling, and I was grateful that my mum had saved all that paper, even though it was decidedly her record of me — the items that she had chosen to save for me.
And when we talk about protecting and preserving students’ educational content — making sure that it doesn’t disappear like all those Posterous blogs are about to — I think much of our concern is about maintaining that record for the future. Collection for the sake of recollection.
That differs — substantially at times — from collecting your data in order to control it. And that differs from collecting it in order to analyze it.
What insights could I glean about myself as a learner from the contents of my manilla envelope — things unknown and unreflected upon, pulled out of the forgotten drawings and scribbled passions of my childhood?
And how might those insights have differed if I was able to review this education data on a real-time and ongoing basis, not just 20-some-odd years later?
The ability to glean insights — in real-time or near-real-time — from students’ data is the cornerstone of the emerging field of learning analytics. And while there are certainly many obstacles to making use of the data that schools already collect about students — thanks to information silos that are both technological and departmental and political — the drive for better learning analytics — and there are both business and research cases will drive this — will make students’ data of increasing importance for all manner of education institutions.
So again, I ask, who owns our education data?
I’ve spent the better part of this week at SXSWedu where it was very clear that student data is of major interest to education companies. This certainly reflects a larger trend in the technology sector — all this buzz about “big data.”
There’s long been a saying among tech folks that “if you aren’t paying for the product, you are the product.” It’s often applied to free tools like Facebook and Google that do use your personal data to sell advertising. But with what’s becoming a more data-oriented world, we might have to admit that even if you are paying for the product, you’re still the product. Your data certainly is.
And companies that have long gathered data about all our transactions and demographics are starting to sift through all that data — in order to improve the product, in order to improve the marketing, in order to beat their competition. There’s a sense — and the metaphor here is pretty horrible if you stop to think about it — that data is the new “oil” and our lives are set to be mined with the value extracted from them. How can we make sure that value stays with us?
Many of the panels at SXSWedu addressed education data (not this question of who owns it — the question of what to do with it, I should add), not surprisingly since one of the major sponsors of the event was inBloom, a new data infrastructure project that’s been funded by a $100 million investment by the Gates Foundation and built by the News Corp-owned Wireless Generation.
inBloom plans to build a centralized database of educational data, arguing — and this is true — that schools’ data infrastructure is woefully out of date and that data is often siloed in various apps and student information systems. But inBloom isn’t just pulling in the data typically contained in a student information system — that is, your name, your grade, your grades. This database will include health care records, behavioral records, and much much more.
The promise: to make learning more personalized and more adaptive. The vision: to build a platform that other third-party software providers can build upon and that schools and perhaps other learning institutions too can utilize.
When I asked Twitter “who owns your education data?” one of the responses was “it doesn’t matter.” The data “has no value except to those who take positive steps to use it.” Framed this way, it doesn’t matter if a student or a school or a software provider or a governmental agency owns the data, as long as its usage is beneficial. Certainly this is the promise of learning analytics: to enhance student outcomes, to boost student retention, and to increase course completion.
Now I won’t argue that these are “positive” uses for students, nor that students don’t want these things for themselves.
But if students do not own and do not control their data, then I fear (again) that data and analytics will be something we do to students, rather than do for them or do with them. Or — and here’s a radical notion — that we enable students to do for themselves.
I think this is why, for me, the “quantified self” movement seems an appealing and important development, not just for how we think of education data but for how we think of all the types of data that we currently create: on social media sites, on blogs, with our smartphones, with personal body sensors, with our credit cards, with our geolocation, with our searches and transactions and clicks.
A “quantified self” movement within education implies personal ownership and certainly demands personal control over data. As such, it requires setting personal goals. It requires a personal definition of “learning.”
It would also require a certain familiarity with the technologies that students utilize; it would require, one would imagine, an understanding of retrieving data and building data visualizations. It would require we read the Terms of Service and avoid applications with onerous ones.
These are all good things, I’d say, empowering things for students — for all of us truthfully — all with the goal of making us subjects and not just objects of technology and research and data.
If we build, then, that virtual version of my mum’s manilla envelope — in the service of not just long-term personal content storage but real-time personal learning analytics — it would demand that many things change in how we think about education data today.
Data would need to be portable; it would need to be interoperable. It would need to be human- and machine-readable — in other words, my transcript shouldn’t just be available on a watermarked piece of paper, my assignments not just stored in a PDF. It would mean that education data could no longer be stuck in silos — on or offline. The storage unit — let’s call it a personal data locker, with a nod to the Locker Project — would need to be sustainable – temporally, technologically, financially. It would need to follow a student throughout her or his school career, and ideally include informal as well as formal learning data. The locker would need to be extensible — that is, apps and visualization tools would be able to be built on top of it.
And the contents of the data locker would be owned and controlled by students. What gets shared. What gets stored. What gets deleted. And yes, perhaps this would mean that the threat that “this will go down on your permanent record” becomes an emptier one.
Too far fetched? Maybe.
But what if education data could not be aggregated or analyzed or tracked or sold without a student’s permission, without their informed consent, without a push notification on their smartphones, perhaps, that someone has accessed their info. How might this shape not just the ownership and control of education data, but students’ ownership and control over their own learning. How might students benefit from this shift?
As it stands, the benefit of much of the data being collected goes to the school or the software provider, but strangely not to the person who created it — to the learner.
And it is, after all, their data, their content, their learning, their data — even if our laws and our policies and our technologies don’t fully recognize it as such. Yet.
I’ve been thinking a lot lately about the OAuth and OpenID specifications lately. I’ve been thinking about the OAuth and OpenID specifications as an authentication and identity technologies, but honestly almost more like metaphors.
OAuth and OpenID are the open technology standards that allow users to be authenticated with certain websites. OAuth, for example, lets a user grant access to their digital resources on one site to another site. The classic example perhaps: you can sign up for an app using Facebook Connect so that you don’t have to supply a username and password.
Often when developers sketch out these specifications, they’ll represent the exchange of data between the three legs — the platform, the app provider and the user as some describe it, or the server, the client and the resource owner — as an equal relationship. Lots of arrows that map out the requests and the authentication. But it’s almost always drawn as an equilateral triangle or a circle — as though the relationship there between the platform and the application and the end-user is balanced. But it’s not.
Don’t get me wrong. OAuth and OpenID do give the user some control here. It’s not the specifications I’m questioning here. And I don’t want to get into a debate in the Q&A section about the merits of OAuth 2.0.
Rather, I’m questioning how we draw the exchange of data between systems. I want us to think about technologies’ metaphors and their architecture and their power relations — they matter.
It actually makes me incredibly happy to raise these questions — particularly questions about data — here at Webwise. I’ve been trying to raise these questions in a variety of places lately — with university administrators, with K–12 teachers. And while I do want all of us to be more critical about our data collection and data usage, I am particularly keen to raise this issue here with this audience and its professional capacity to think smartly on this topic.
After all, I am thankful for the protests by librarians at laws like the Patriot Act and their ongoing work to protect the privacy and the confidentiality of patrons’ library records. I’m also deeply appreciative of the work that archivists and others here do to think carefully — so carefully — about the preservation of artifacts — digital and otherwise — about the objects themselves and about their metadata. I also value the work that those professions here do regarding open and accessible knowledge, content, data — as well as the challenges of thinking critically about lines between what’s “mine” and what’s “ours.” These are all incredibly important insights that those working in education and in education technology need to learn from.
And I would hope that as we move forward in our more digital, data-oriented world, that questions surrounding the ownership of learners’ data — the centrality of the learner in these and all our discussions — is something that this group can tackle.
“Whose data is it?” “Whose learning is it anyway?” — it’s the learners’, right? But it’s also, at some point, the communities’ — learning isn’t simply about the individual; it’s tied to the greater public good.
And as we move forward building more systems that capture and store and analyze learners’ data, how do we make sure we do so in such a way that the value isn’t extracted from the learner or from the community — trapped in a technological silo, mined simply for corporate profit — but that creates more value — real-time value and long-lasting value.
I would hope that when we ask the question “whose learning is it anyway?” — and with a nod to TV improv too — that we can do so with a smile and not with horror, that we can think of human beings enhanced by technology and not just surveilled or mined or driven by it.