Redact APIs
Redaction User Interface
There is a Web UI (by default at http://localhost:8890/redact/) for redacting a document and visualizing the results. This is intended to make adoption simple for business roles in-production, but may also be useful as you learn the APIs.
The redact API accepts an arbitrary number of records for several supported document formats. It returns a field-level redacted version of the records valid for the stated purpose, along with details about all reasons behind the actions taken on the document, and statics about the resulting content. The API will take a single a record and return immediately, or act in bulk on any size document and return the full result asyncronously when ready. It's common for data to be dumped from operational or aggregate databases, business tools (like a CRM or BI tool), or automated reporting, into simple structured document like a CSV or JSON document, and this API is designed to automate applying that static document to different business purposes.
When to use the Reaction API versus a Pipeline Export
A Pipeline Export can be configured to run in redaction mode, and can return the results of redacting an input document alongside the redaction reasons, so it has some overlap. Because it's designed to work in real-time as a single component of a data pipeline, however, it's not designed to support asyncronous operation and not intended to be a solution for ad hoc queries from business teams. If you want to work on a per-record basis and track business policy side-effects, then Pipeline is the right interface. If however, you need to support business or data science teams that have arbitraruly large documents that need on-demand redaction, you should use this API.
Example Input#
As a common challenge, an advertising team has been given a dump of user data in CSV format. They want to assert the Targeted Ads Purpose, but don't know which subset of the data is compliant. The data dump came from the operator of an internal database who doesn't know which fields are needed, or which users may be targeted, so they may err on the side of over-sharing:
ID,Name,Email,CCNumber
1,Carol,carol@example.com,xxxxxxxxxxxx1234
2,Alice,alice@example.com,xxxxxxxxxxxx1234
3,Bob,bob@example.com,xxxxxxxxxxxx1234
4,Eve,eve@example.com,xxxxxxxxxxxx5678
5,Dan,dan@example.com,xxxxxxxxxxxx9012
The challenge for the advertising team is picking the permissible parts of this file without drawn out calls to the legal department. The advertising team doesn't know the configuration, and even if they did, they don't know the context of each user. All they do know is that there's a mapping group they can use for this CSV file that was created by the operator of the internal database.
We met Alice and Carol in the user API section. Let's onbaord the other three users now:
By that by the terms of our example configuration, when asserting TargetedAds the following conditions apply:
- Carol is younger than an age-based override that explicitly disallows this purpose
- Alice has not accepted platform terms and is not under a contract, so none of her data may be used for any purpose
- Eve lives is a region with a location-based override that explicitly disallows this purpose
- Dan is covered under a contract that explicitly disallows this purpose
This leaves Bob, who's data may be used, but only if it falls into the contact category.
Redacting the Document#
The flow of the API follows the flow of the UI. You start by issuing a redaction request:
curl -X PUT -H "Accept: text/csv" -H "Content-Type: text/csv" \
"http://localhost:8890/redact/document?purposeId=use.targetedads&type=platformUse&mappingGroup=Transform" \
-d 'ID,Name,Email,CCNumber
1,Carol,carol@example.com,xxxxxxxxxxxx1234
2,Alice,alice@example.com,xxxxxxxxxxxx1234
3,Bob,bob@example.com,xxxxxxxxxxxx1234
4,Eve,eve@example.com,xxxxxxxxxxxx5678
5,Dan,dan@example.com,xxxxxxxxxxxx9012'
If you get back a 200 response, it will include the jobName for the queued-up redaction process. This can be used to check on stats, by calling:
If this returns 404 it means that the redaction job is still running. If you get a 200 it means that the job is done, and details are included explaining how data was redacted/dropped to meet the requirements of the stated purpose.
Once the redaction job is complete, you can retrieve the field-level redacted file:
curl -X GET -H "Accept: application/octet-stream" \
"http://localhost:8890/redact/document?jobName=NAME"
This returns a CSV that has dropped the impermissible rows, and has redacted the fields that are't part of an allowed category (in this case, CCNumber) or that don't have an associated category in the mapping definition (in this case, ID):
This redacted document is now valid for the Platform Use Purpose Targeted Ads, and the decision process has been audited in the trace stream, so that anyone can see the right process was run to generate this view.
Testing a Single Record#
If you wanted to see what the result would be for only a single row, you can make a single-record request. For instance, as a developer or tester you might want to check what the result would be just for Alice in the previous example. To do this, you would issue:
curl -X PUT -H "Accept: application/json" -H "Content-Type: application/json" \
"http://localhost:8890/redact/record?purposeId=use.targetedads&type=platformUse&mappingGroup=Transform" \
-d '{
"ID" : 2,
"Name" : "Alice",
"Email" : "alice@example.com",
"CCNumber" : "xxxxxxxxxxxx1234"
}'
This provides a single record using the same structure as the CSV example, but instead of having to test for status and download the resulting document when it's ready, the caller gets the resulting document back immediately. If you wanted to get back just the reason string for any action taken by Tranquil Data on this record, and not the redacted record itself, then you can use a Pipeline Export configured to only return advice with the attribute identifier urn:tranquildata:attribute:message:summary.