How to Share Data (Hint: “Thoughtfully”)
This blog post is part of a blog series on “Open Data for Public Good,” a collaboration between the AWS Institute and AWS Open Data aimed at identifying emerging issues around open data and offering best practices for data practitioners. Read the first post here.
Sharing data requires more than just making it available for download or creating an API to access it. In many ways, sharing data is similar to shipping a software product. Just like software; data is made up of digital information; it requires documentation; it will be used by groups of users who may require support; and it may become vital to those users’ work. Another common characteristic of software is that it often gets updated over time as software developers learn from their users and adapt to new technologies.
So how should you share data? Thoughtfully. There are technical and non-technical considerations. Below, we explore the non-technical considerations, which require a focus on data governance and community engagement.
Trust
For data to be useful, it must come from a reliable source that users trust. If users do not believe that data has been produced or documented with sufficient rigor, they will be less likely to rely on it. The USGS Landsat program provides an important lesson in trust.
Launched in 1972, the Landsat program has been a reliable source of data and imagery of the Earth for decades. This continues to be the case despite an incident in 2003 when one of the instruments on Landsat 7 failed, which caused gaps in data produced by the sensor from that point on. The Landsat team worked to document how users could still use data from Landsat 7 despite the limitations imposed by the instrument failure. This transparency in operations helped maintain the trust of users in data generated by the program. If users understand the value and limitations of data, they will be more likely to use it, even if it’s not perfect. This requires open and clear communication with users.
Jupiter: Hubble’s decades of observations of the planets in the outer Solar System allow astronomers to study their seasonal variations and provide support for NASA’s dedicated suite of spacecraft that visit these celestial bodies. Credit: NASA, ESA, and A. Simon (NASA/GSFC).
Documentation
If data is not documented, the audience will be limited. There are times when users will do the detective work required to interpret poorly documented data, but a lack of documentation will usually frustrate users to the point that they will not trust the data. Users should be able to understand when data was created, the methodology used to create it, how to interpret the values contained in it, and if there are any licenses that may limit how the data can be used. Documentation should also include a method to contact someone who can answer questions about the data. Ideally, documentation should include tutorials that users can follow to get hands-on experience with data.
Reliability
Developers will not put in the effort to create tools or applications based on data if they have no assurance that the data will be available in the future. Assuring that data will be available on an ongoing basis is important when sharing large volumes of data.
The cloud provides infrastructure for storage, low-latency access, and transfer of data. This becomes more important as data volumes grow. When data is shared in the cloud, users can access data quickly and directly from the source, which assures them that they can reliably access a trustworthy copy of the data without the need to duplicate storage in their own account.
Conclusion
In an era where governments and organizations are opening their data up to the public, making sure that data is shared in a deliberate and transparent way is key to establishing trust with users and increasing the utility of data. For information on technical considerations behind sharing data, visit opendata.aws.
A post by Jed Sundwall, Manager, Open Data Program, AWS
相關推薦
How to Share Data (Hint: “Thoughtfully”)
This blog post is part of a blog series on “Open Data for Public Good,” a collaboration between the AWS Institute and AWS Open Data aimed at ident
Data Wrangling文摘:How to share data with a statistician
原文地址:GitHub - jtleek/datasharing: The Leek group guide to data sharing https://github.com/jtleek/datasharing This is a guide for anyone who needs to
How to Access Data in a Property Tree
clas 代碼 3.0 float itl compute iter () find 在屬性樹裏怎麽訪問數據?屬性樹類似於(幾乎是)一個標準容器,其值類型為pair。它具有通常的成員函數,如insert、push_back、find、erase等,當然可以使用這些函數來填充
high-speed New Algorithm Enables Wi-Fi Connected Vehicles to Share Data
www.inhandnetworks.com Graphic: Christine Daniloff Next month at the ACM SIGACT-SIGOPS Symposium on Principles of Distributed Computing, a tea
How To Learn Data Science If You're Broke
Over the last year, I taught myself data science. I learned from hundreds of online resources and studied 6–8 hours every day. My goal was to start a caree
How To Learn Data Science If You’re Broke
Advice for executing your curriculum.1. Concepts will come at you faster than you can learn them.There are literally thousands of web pages and forums expl
Ask HN: How to manage data for local microservice dev envs?
Microservice development accommodates several kinds of dev env setups. For cross-service integration testing during development (rather than test automatio
How to Prepare Data For Machine Learning
Tweet Share Share Google Plus Machine learning algorithms learn from data. It is critical that y
How to Load Data in Python with Scikit
Tweet Share Share Google Plus Before you can build machine learning models, you need to load you
How to read *.data in Matlab and Python
See P2 of "HW2_545_2018_"In Matlab:z = dlmread('spambase.data',',');In Python:import numpy as npz = np.genfromtxt('spambase.data', dtype=float, delimi
How to Change Default Location for Outlook Data File (PST & OST)
note right folder dialog https error data locate http Is there a way to change the default location of new .pst file when create a new e-
How to convert BigDecimal to Double in spring-data-mongodb framework
public 行存儲 沒有 err 自己 dbr tom odbc sim 問題描述:我們都知道對於涉及錢的數據必須使用BigDecimal類型進行存儲,今天在查詢mongo時仍然有精度問題,雖然我在代碼中使用了Big Decimal類型,但mongo中使用的是double
How to SUM and GROUP BY of JSON data?
How to SUM and GROUP BY of JSON data? Source: StackOverflow.com Question Some server-side code actually generates a JSON formatted stri
[轉]How to display the data read in DataReceived event handler of serialport
本文轉自:https://stackoverflow.com/questions/11590945/how-to-display-the-data-read-in-datareceived-event-handler-of-serialport 問: I have the followin
How To Load CSV Machine Learning Data in Weka (如何在Weka中載入CSV機器學習資料)
How To Load CSV Machine Learning Data in Weka 原文作者:Jason Brownlee 原文地址:https://machinelearningmastery.com/load-csv-machine-learning-data-weka/
How To Build A Money Data Type In JavaScript
Last time I wrote a step-by-step example of how to apply Inside Out Test-Driven Development to a problem using JavaScript. That post used the Number type t
Ask HN: How to best teach algorithms and data structures?
Here's what I try to do:1. Start with the intuition. (The big concepts i.e. for BigO trying to figure out the best, worst and average number of steps it ta
Simpson’s Paradox: How to Prove Opposite Arguments with the Same Data
Simpson’s Paradox occurs when trends that appear when a dataset is separated into groups reverse when the data are aggregated. In the restaurant recommenda
Ask HN: How to model numerical energy data in Wolfram Alpha
I'm working on a dataset that contains energy supply and consumption data. This is just a hobby and idea is to do visualizations and simple moodelling base
Ask HN: How to implement caching for dynamic user data in sites like HN, Reddit?
Why would you start by caching it?What are you storing the data in currently? If relational, I'd advise starting with simple relational tables (post_commen