On Educational Data Mining

star collage (hydra)

The Department of Education released a draft report about big data and education today. It's called "Enhancing Teaching and Learning through Educational Data Mining and Learning Analytics," a title that's unlikely to win any converts to the notion of a data-curious* view of learning. Part of what's going to get stuck in the craw is that phrase "data mining," I reckon.

Despite all the potential and all the buzz about (big) data, data-mining remains something with a fairly negative connotation. Advertisers. Political campaigns. Big government. All sifting through your personal data, trying to uncover the things that nobody knows about, trying to get you to buy or sell or vote. Add to that now the knowledge that every click we make online -- every YouTube view and Facebook like and Google query -- is eminently trackable, it's enough to make all those unsolicited phone calls and junk mail seem quite benign, not to mention old-fashioned.

As NYU doctoral student Solon Barocas argues in an interview at O'Reilly's Strata conference last year, that notion of data mining may be inaccurate, but the phrase "almost intuitively for most consumers implies scavenging through the data, trying to find secrets that you don't necessarily want people to know." I recommend Barocas's interview in part because I think in that 6 minute video, you'll see a data scientist push back on the notion that data mining is simply a high-tech form of snooping. Rather data-mining is a way of finding patterns and trends in large datasets using statistics and machine learning.

But Barocas is also clear that there are serious ethical concerns to be weighed surrounding data-mining -- and PR ramifications, of course, if questionable data mining practices are made public.

So what exactly would we construe as "questionable data mining practices" in education? And what exactly should we consider as useful ones?

The latter goes to the heart of the Department of Education report, which makes a case for the importance of data mining and learning analytics in (perhaps) helping answer questions like:

  • What sequence of topics is most effective for a specific student?
  • What student actions are associated with more learning?
  • What student actions indicate satisfaction, engagement, learning progress, etc.?
  • What will predict student success?
  • When is a student falling behind and/or at risk for not completing a course?

"Commercial entities have led the way in developing techniques for harvesting insights from this mass of data for use in identifying likely consumers of their products, in refining their products to better fit consumer needs, and in tailoring their marketing and user experiences to the preferences of the individual," reads the report.  I think it's worth asking critical questions about how and if we can apply these same techniques to education.  Is a consumer the same thing as a learner?  Why or why not?

For its part, the Department of Education report touts Netflix as an exemplary model for taking consumers' data, "mining" it, and building something useful for those very users, creating models and profiles so that it can make recommendations based on viewing and rating information. That's worth repeating: user data isn't just extracted for its value to Netflix (although yes, it does that too).  Through big data analytics, something of value is built for users in turn. It is worth noting, however, that while the Netflix example is often used to demonstrate how useful the insights gleaned from our online activity can be, I find it also a good indication how far still we have to go before these algorithms really "get" us -- "get" our movie preferences, let alone our learning habits. (No, Netflix, just because I watched Terminator does not mean I might like to watch Beverly Hill Cop III. What it means is that you have a lousy selection of streaming content. But I digress…)

The Department of Education's report recognizes too that there are still lots of obstacles ahead for educational data analytics. There's the technology. There's the access to the data (most of which remains siloed across a multitude of systems). There are inconsistencies in data collection, storage, and formats. There are questions of institutional capacity (who's storing and who's analyzing all this data for a school?). And there are privacy and ethical concerns.

Add to these obstacles -- and not mentioned anywhere in the report -- is the question of ownership of data. Who owns the data in an LMS, for example? The LMS provider? The school? The instructor? The student? How do we make sure that in our rush to uncover insights and build "personalized" systems, that the student isn't just the object of all this analysis and decision-making? How do we make sure the student has agency and control -- over their data and their learning?

* I purposefully use the adjective "data-curious" here in lieu of "data-driven," in part because the latter has become politicized beyond the point of meaning. But also because I'm not sure we actually know enough about educational data yet to let it "drive" our conversations, let alone our policies. Until then, I do remain curious…

Photo credits: Helen Cook



Tags: , , ,