Authenticating with Public/Private Keypair
Public/Private Keypair authentication is recommended when using the NERD API for production use cases. While it requires some additional setup, this method is the most secure and easy way to use NERD in the long term. For testing and development, individual users should check out the authentication quickstart guide.
You can also check out our video guide to authentication methods.
Keypair Setup
Generate an RSA Keypair
In this guide, we will use the openssl
library, which is available on Unix systems (which includes Macs).
First, open a terminal and generate a 2048-bit private key:
openssl genrsa -out private.pem 2048
Next, extract the public key:
openssl rsa -in private.pem -outform PEM -pubout -out public.pem
Send Kensho Your Public Key
Email support@kensho.com with your PEM encoded public key as an attachment. We will respond with your Client ID
.
You will need this ID in the following step. While typical response times are very quick, please allow up to
three business days for us to process your request.
Important: Do not send us your private key! While your public key and Client ID
are not secret, your private key
should not be shared outside your organization.
Use Your Private Key and Client ID to Generate an Acess Token
Most languages have JWT (JSON Web Token) libraries. In this example, we make use of PyJWT, a JWT library for Python.
We provide a get_access_token_from_key
helper function that can be used.
import jwtimport requestsimport timedef get_access_token_from_key(client_id):PRIVATE_KEY_PATH = "private.pem" # the location of your generated private key filewith open(PRIVATE_KEY_PATH, "rb") as f:private_key = f.read()iat = int(time.time())encoded = jwt.encode({"aud": "https://kensho.okta.com/oauth2/default/v1/token","exp": iat + (30 * 60), # expire in 30 minutes"iat": iat,"sub": client_id,"iss": client_id,},private_key,algorithm="RS256",)response = requests.post("https://kensho.okta.com/oauth2/default/v1/token",headers={"Content-Type": "application/x-www-form-urlencoded","Accept": "application/json",},data={"scope": "kensho:app:nerd","grant_type": "client_credentials","client_assertion_type": "urn:ietf:params:oauth:client-assertion-type:jwt-bearer","client_assertion": encoded,})return response.json()["access_token"]CLIENT_ID = "" # paste your Client ID sent to you by Kensho inside the quotation marksACCESS_TOKEN = get_access_token_from_key(CLIENT_ID)
Now that you have an Access Token
, you are ready to use NERD! Head over to
the Text Annotation Guide to get your first NERD annotations.
Production Use
Using your Public/Private Keypair allows you to always generate a fresh Access Token
. This token expires every hour,
on the hour, so you will need to regenerate it during a long-running application. We provide an example in Python below.
Note that this snippet uses the get_access_token_from_key
defined above. For more details on using the NERD API,
check out the Text Annotation Guide.
The code below defines a NerdClient
class that can be used to make requests to the NERD asynchronous endpoint.
This client will update the access token when needed, using public/private keypair authentication.
You will need to paste your Client ID (emailed to you by our support@kensho.com team in the steps above) in the field
CLIENT_ID
. Below the NerdClient
definition, the client is used to read text files, send them to the NERD API,
and get the responses.
import jsonimport requestsimport timeimport osNERD_API_URL = "https://nerd.kensho.com/api/v1/annotations-async"CLIENT_ID = "" # paste your Client ID sent to you by Kensho inside the quotation marksclass NerdClient:"""A class to call the NERD API that automatically refreshes tokens when needed."""def __init__(self, client_id):self.client_id = client_iddef update_access_token(self):self.access_token = get_access_token_from_key(self.client_id)def call_api(self, verb, *args, headers={}, **kwargs):"""Call NERD API, refreshing access token as needed."""if not hasattr(self, "access_token"):self.update_access_token()def call_with_updated_headers():nonlocal methodheaders["Authorization"] = f"Bearer {self.access_token}"return method(*args, headers=headers, **kwargs)method = getattr(requests, verb)response = call_with_updated_headers()if response.status_code == 401:self.update_access_token()response = call_with_updated_headers()return responsedef make_async_annotations_request(self, data):"""Make a POST call to NERD Async Endpoint."""response = self.call_api("post",NERD_API_URL,data=json.dumps(data),headers={"Content-Type": "application/json"})return response.json()["job_id"]def get_async_annotations_results(self, job_id):"""Get annotations results from NERD Async Endpoint."""while True:response = self.call_api("get",NERD_API_URL + "?job_id=" + job_id)result = response.json()if result["status"] != "pending":breaktime.sleep(10)return result# data preparationfile_dir = "" # file path to directory containing documents you want NERD to processfiles = os.listdir(file_dir)job_dict = {} # dict to store file_name/job_id pairdata = {"knowledge_bases": ["capiq"]}# create a nerd clientnerd_client = NerdClient(CLIENT_ID)# submit requests to async endpointfor file_name in files:file_name = os.path.join(file_dir, file_name)with open(file_name, "r") as f:text = f.read()data.update({"text": text})job_id = nerd_client.make_async_annotations_request(data)job_dict.update({file_name: job_id})print(f'Submitted {file_name} as {job_id}')time.sleep(0.1)# retrieve results from async endpointfor file_name, job_id in job_dict.items():file_name += '.nerd.json'result = nerd_client.get_async_annotations_results(job_id)print(f'Wrote result for {job_id} to {file_name}')with open(file_name, 'w') as result_file:json.dump(result, result_file, indent=4)time.sleep(0.1)