Ray Schwartz is a systems librarian at the William Paterson University in Wayne, New Jersey. He frequently presents on topics relating to the use of many forms of electronic transactional data and datamining. He is teaching a course for Library Juice Academy next month called, “Collecting and Evaluating Electronic Transactions from Library Services.” He agreed to do an interview here to give people a better idea about what will be covered in the class and where he is coming from.
Hi, Ray. Thanks for agreeing to do this interview. I’d like to start by asking about your background, that is, how you came to be doing the work that you’re doing that led you to propose this course to us.
I began working in libraries back in the late 80s as library assistant at Columbia University. It was there I developed a strong interest in computing and libraries. After I acquired my MLS in 91, I began working at a cataloger until 1994 when I took a job as an “Electronic Resources” Librarian at Rutgers University. I was always interested in the underlying workings of systems, be it library catalogs, and any other kind of machine. But of course the main reason why we are doing the work we are doing is to have an impact upon our users and society as a whole. Hence, the interest in analytics and other assessment tools.
It seems like one of the benefits of more sophisticated systems is that there is a greater variety of data that you can use for that purpose. I mean, part of the problem with relying heavily on circulation data, from my point of view, is that it doesn’t tell you anything about the nature of the use – whether a book was checked out along with 50 others for skimming or whether it was read closely again and again for example, and also that books used within the library and put back on the shelves aren’t represented in circulation counts. What are some ways that the greater abundance of data now is improving your ability to make judgments that are numbers-based?
Before I get to your question, I would like to comment on what these numbers tell one about the “nature of use”. In short—not much. Without qualitative tools, such as focus groups, one’s ability to know “nature of use” is very limited. Take for example interlibrary loan, one cannot assume that charging out materials from ILL means a patron approves of the quality of these items. Past studies have shown many times books are requested by a patron to evaluate and are subsequently rejected. Considering your observation about books being used in the library and later reshelved, some libraries do record that usage by scanning the barcode as a “browse” and not a “charge” before reshelving them. One always have to be careful in over interpreting the data. However, given that there is more variety in what is collected, there is more opportunity in “triangulating” the transactional trails. Before, we could not look at article usage, but now we can see these numbers. One can compare book circulation, database use, catalog use, web site use and other services by a given category of user (e.g., undergraduate history majors). Of course, our on-campus use of databases does not record any categories regarding the patron (only location of the machine and browser software). Whereas off-campus use has all patron categories recorded. So to get back to your question, it is possible to see if a select group of patrons use one service more than another. This analysis can lead to reallocation of resources or changes in marketing and/or support, and further inquiries via qualitative tools. Also web analytics, though difficult to implement, are a set of tools where one could dive further into the use of web resources, such as how long people remain on a particular web page, from where did they click from, and how many new and returning users there are.
Okay, you’ve referred to some of the data sources and techniques that you will be covering in the class. Can you say a bit more about what you will cover and what participants can expect to come away with?
The course is a holistic overview of the academic library operations in respect to transactional data collection, data manipulation and analyses—both current and potential. From a systems librarian’s perspective, I will cover what data can be collected, what tools and coding and/or database management skills are needed for collection, evaluation, and analyses, and the issues of privacy and retention of data. We will explore how those tools and skills can be used to warehouse and mine data. Examples will cover the range from in-house solutions built by skilled programmers to the average computer user operating Google Analytics, Microsoft Access and Excel. The types of analyses demonstrated will be web analytics, trend visualization and dashboards. The course will not teach students how to use the tools, but will introduce them to the tools’ capabilities, accessibility (e.g., cost), advantages and disadvantages. What I intend for the student to walk away with is an understanding of the both potential and limitations of what one can do with this data.
That sounds great.
One interest of mine personally is critical analysis of the way we use data in libraries, so I am pleased that you will give attention to the limitations of what we can do with the data that is collected. A favorite example of how it is not always obvious what conclusions are to be drawn from data is when a typical academic library director will draw two opposite conclusions from indications of lower-than-expected resource use. If it’s books, then the conclusion tends to be that the collection budget ought to be shrunk in order to transfer funding to meet patron demand. If it is an electronic resource, then the resource is “underutilized” and needs to be actively promoted. Just something I have observed over the years. I am not sure if that type of phenomenon is within the scope of your class, but the way that people reason with data could fall into the category of analysis of data as you cover it, I would imagine. Do you think that the availability of data and tools creates a temptation to overuse data, or to draw conclusions from available data that are not valid, or strongly determined by the way that the data is produced?
People always over-interpret and/or infer whatever data they have, or even use the data to justify prior agendas. One must understand what is REALLY being recorded and be disciplined in his/her approach. Yes, the course will always emphasize this issue throughout. To answer the latter part of your question, I tend to believe that the introduction of new tools does have what I call a ‘honeymoon’ period. People (particularly in our culture) are seduced by new and powerful applications. It is once people get more experience with the application that the questions are brought up. It is like looking at the world through a long tube. There is only so much you can see.
Thanks, that’s insightful. I have a final question as a way of following up. I wonder what you would suggest if you had the opportunity to teach any class that you wanted, even if it might seem unusual or unorthodox, or of limited interest. What would be your dream course to teach?
Interesting question. I would probably give you a different answer depending where my mind is at the moment. Well, personally I would love to structure a ‘hackathon’ where the outcome makes an important impact in the lives of people. Sounds very general, but what I mean is to get a group together to crunch the numbers and processes (and what I mean by processes are how numbers are portrayed and used) in a way to motivate people politically to improve theirs and others’ lives. The hackathon would not only have coders attending, and in order to function it should have a variety of people with various skills, outlooks, and experience. The underlying intent is to show people how to work together to create change.
That’s an extremely interesting idea – thank you. This has been an interesting interview. Best of luck with your class, and thanks for taking the time to do this.
You’re most welcomed.