Shawn M. Jones | Researcher | Software Engineer

Hypercane: Toolkit for Summarizing Large Collections of Archived Webpages

publication newsletter

Oct 1, 2021

Hypercane: Toolkit for Summarizing Large Collections of Archived Webpages

Accepted Future Publication

by Shawn M. Jones, Valentina Neblitt-Jones, Michele C. Weigle, Martin Klein, Michael L. Nelson

In the Dark and Stormy Archives (DSA) project, we focus on storytelling techniques to summarize collections of archived web pages. Since collections can have hundreds or even thousands of seeds (initial URLs) and each seed can be recrawled many times, with each version separat...

Hypercane: Intelligent Sampling for Web Archive Collections

publication poster

Sep 29, 2021

ACM/IEEE JCDL 2021

Hypercane: Intelligent Sampling for Web Archive Collections

Accepted Future Publication

by Shawn M. Jones, Michele C. Weigle, Martin Klein, Michael L. Nelson

Humans can choose individual documents from a web archive collection, but doing so is difficult if they are unfamiliar with the collection. The issue is scale. Most web archive collections consist of thousands of documents. Hypercane is a tool that automates the selection of d...

It's All About The Cards: Sharing on Social Media Probably Encouraged HTML Metadata Growth

publication conference-paper

Sep 28, 2021

ACM/IEEE JCDL 2021

It's All About The Cards: Sharing on Social Media Probably Encouraged HTML Metadata Growth

Accepted Future Publication

by Shawn M. Jones, Valentina Neblitt-Jones, Michele C. Weigle, Martin Klein, and Michael L. Nelson

Humans can choose individual documents from a web archive collection, but doing so is difficult if they are unfamiliar with the collection. The issue is scale. Most web archive collections consist of thousands of documents. Hypercane is a tool that automates the selection of d...

Interoperability for Accessing Versions of Web Resources with the Memento Protocol

publication book-chapter

Jul 1, 2021

The Past Web: Exploring Web Archives

Interoperability for Accessing Versions of Web Resources with the Memento Protocol

by Shawn M. Jones, Martin Klein, Herbert Van de Sompel, Michael L. Nelson, and Michele C. Weigle

Used by a variety of researchers, web archive collections have become invaluable sources of evidence. If a researcher is presented with a web archive collection that they did not create, how do they know what is inside so that they can use it for their own research? Search eng...

Automatically Selecting Striking Images for Social Cards

publication conference-paper

Jun 21, 2021

ACM Web Science 2021

Automatically Selecting Striking Images for Social Cards

by Shawn M. Jones, Michele C. Weigle, Martin Klein, Michael L. Nelson

To allow previewing a web page, social media platforms have developed social cards: visualizations consisting of vital information about the underlying resource. At a minimum, social cards often include features such as the web resource’s title, text summary, striking image, a...

Robustifying Links To Combat Reference Rot

publication journal-article

Feb 10, 2021

code{4}lib Journal

Robustifying Links To Combat Reference Rot

by Shawn M. Jones, Martin Klein, and Herbert Van de Sompel

Links to web resources frequently break, and linked content can change at unpredictable rates. These dynamics of the Web are detrimental when references to web resources provide evidence or supporting information. In this paper, we highlight the significance of reference rot, ...

Web mentions

InfoDocket

SHARI -- An Integration of Tools to Visualize the Story of the Day

publication workshop-presentation

Aug 4, 2020

Web Archiving and Digital Libraries 2020

SHARI -- An Integration of Tools to Visualize the Story of the Day

by Shawn M. Jones, Alexander C. Nwala, Martin Klein, Michele C. Weigle, Michael L. Nelson

Tools such as Google News and Flipboard exist to convey daily news, but what about the past? In this paper, we describe how to combine several existing tools with web archive holdings to perform news analysis and visualization of the “biggest story” for a given date. StoryGrap...

MementoEmbed and Raintale for Web Archive Storytelling

publication workshop-presentation

Aug 4, 2020

Web Archiving and Digital Libraries 2020

MementoEmbed and Raintale for Web Archive Storytelling

by Shawn M. Jones, Martin Klein, Michele C. Weigle, Michael L. Nelson

For traditional library collections, archivists can select a representative sample from a collection and display it in a featured physical or digital library space. Web archive collections may consist of thousands of archived pages, or mementos. How should an archivist display...

Web mentions

InfoDocket

Social Cards Probably Provide For Better Understanding Of Web Archive Collections

publication conference-paper

Nov 3, 2019

ACM CIKM 2019

Social Cards Probably Provide For Better Understanding Of Web Archive Collections

by Shawn M. Jones, Michele C. Weigle, Michael L. Nelson

Used by a variety of researchers, web archive collections have become invaluable sources of evidence. If a researcher is presented with a web archive collection that they did not create, how do they know what is inside so that they can use it for their own research? Search eng...

publication conference-paper

Sep 20, 2018

iPres 2018

The Off-Topic Memento Toolkit

by Shawn M. Jones, Michelle C. Weigle, and Michael L. Nelson

Web archive collections are created with a particular purpose in mind. A curator selects seeds, or original resources, which are then captured by an archiving system and stored as archived web pages, or mementos. The systems that build web archive collections are often configu...

Web mentions

InfoDocket

publication conference-paper

Sep 20, 2018

iPres 2018

The Many Shapes of Archive-It

by Shawn M. Jones, Alexander Nwala, Michelle C. Weigle, and Michael L. Nelson

Web archives, a key area of digital preservation, meet the needs of journalists, social scientists, historians, and government orga- nizations. The use cases for these groups often require that they guide the archiving process themselves, selecting their own original resources...

Avoiding spoilers: wiki time travel with Sheldon Cooper

publication journal-article

Mar 1, 2018

International Journal on Digital Libraries

Avoiding spoilers: wiki time travel with Sheldon Cooper

by Shawn M. Jones, Michael L. Nelson, and Herbert Van de Sompel

A variety of fan-based wikis about episodic fiction (e.g., television shows, novels, movies) exist on the World Wide Web. These wikis provide a wealth of information about complex stories, but if fans are behind in their viewing they run the risk of encountering “spoilers”—inf...

publication poster

Jun 16, 2017

IIPC Web Archiving Conference 2017

Uniform Access to Raw Mementos

by Herbert Van de Sompel, Michael L. Nelson, Lyudmila Balakireva, Martin Klein, Shawn M. Jones, and Harihar Shankar

Most web archives augment Mementos when presenting them to the user, often for usability or legal purposes. Research efforts and software projects need access the original captured “raw” Mementos. So that users and software do not need to resort to archive-specific solutions, ...

Scholarly Context Adrift: Three out of Four URI References Lead to Changed Content

publication journal-article

Dec 2, 2016

PLOS One

Scholarly Context Adrift: Three out of Four URI References Lead to Changed Content

by Shawn M. Jones, Herbert Van de Sompel, Harihar Shankar, Martin Klein, Richard Tobin, and Claire Grover

Increasingly, scholarly articles contain URI references to “web at large” resources including project web sites, scholarly wikis, ontologies, online debates, presentations, blogs, and videos. Authors reference such resources to provide essential context for the research they r...

Web mentions

Persistent URIs Must Be Used To Be Persistent

publication poster

Apr 4, 2016

WWW 2016

Persistent URIs Must Be Used To Be Persistent

by Herbert Van de Sompel, Martin Klein, Shawn M. Jones

We quantify the extent to which references to papers in scholarly literature use persistent HTTP URIs that leverage the Digital Object Identifier infrastructure. We find a significant number of references that do not, speculate why authors would use brittle URIs when persisten...

Web mentions

DSHR's Blog, by David Rosenthal

Rules of Acquisition for Mementos and Their Content

publication technical-report

Feb 19, 2016

arXiv

Rules of Acquisition for Mementos and Their Content

by Shawn M. Jones, Harihar Shankar

Text extraction from web pages has many applications, including web crawling optimization and document clustering. Though much has been written about the acquisition of content from live web pages, content acquisition of archived web pages, known as mementos, remains a relativ...

Avoiding Spoilers in Fan Wikis of Episodic Fiction

publication preprint

Jun 20, 2015

arXiv

Avoiding Spoilers in Fan Wikis of Episodic Fiction

by Shawn M. Jones, Michael L. Nelson

A variety of fan-based wikis about episodic fiction (e.g., television shows, novels, movies) exist on the World Wide Web. These wikis provide a wealth of information about complex stories, but if readers are behind in their viewing they run the risk of encountering “spoilers” ...

Web mentions

Avoiding Spoilers on Mediawiki Fan Sites Using Memento

publication masters-thesis

May 30, 2015

Old Dominion University

Avoiding Spoilers on Mediawiki Fan Sites Using Memento

by Shawn M. Jones

A variety of fan-based wikis about episodic fiction (e.g., television shows, novels, movies) exist on the World Wide Web. These wikis provide a wealth of information about complex stories, but if readers are behind in their viewing they run the risk of encountering spoilers” –...

Bringing Web Time Travel to MediaWiki: An Assessment of the Memento MediaWiki Extension

publication technical-report

Jun 16, 2014

arXiv

Bringing Web Time Travel to MediaWiki: An Assessment of the Memento MediaWiki Extension

by Shawn M. Jones, Michael L. Nelson, Harihar Shankar, Herbert Van de Sompel

We have implemented the Memento MediaWiki Extension Version 2.0, which brings the Memento Protocol to MediaWiki, used by Wikipedia and the Wikimedia Foundation. Test results show that the extension has a negligible impact on performance. Two 302 status code datetime negotiatio...

Publications

Hypercane: Toolkit for Summarizing Large Collections of Archived Webpages

Hypercane: Intelligent Sampling for Web Archive Collections

Web mentions

Web mentions

Web mentions

Web mentions

Web mentions

Web mentions