New endpoint for detecting whether tweets are still live
If you have a large collection of tweets it can be unwieldy to use the API to detect which tweets have been deleted - beyond complying with Twitter's own terms of service this is also necessary to respect a users agency over their own content. When a tweet or tweets are no longer available can also have an impact on what and how the results of academic research can be reported.
When sharing a dataset of tweet ids, you can also spend a lot of time trying to hydrate tweets that have mostly been deleted.
It would significantly simplify this process if there was an endpoint that given a list of tweet ids, would return a list of either which tweets are still live, or which tweets have been deleted, without returning any of the actual content.
With a higher rate limit (since it doesn't need to return actual content), keeping a collection aligned with what is still publicly available would be significantly simpler.
Digital Observatory Team commented
Main differences from the existing functionality:
- Higher rate limit enabled by much smaller payload. Why retrieve the raw JSON again for a tweet to check if it's live, when my only interest is the immutable field like tweet content?
Although the lookup API can do this (and a higher rate limit would ease this process), it's challenging to build a good pipeline around this. Most use cases of tweet data are going to require processing the raw JSON into some other more useful schema.
I don't see this as a replacement for statuses/lookup in terms of compliance, more an adjunct that allows a more rapid response and the ability to more frequently keep a collection in sync with live Twitter.
If implemented, the results should also tell you when a tweet was deleted.
Igor Brigadir commented
For extremely large collections spanning several years statuses/lookup is cumbersome, it might be better to provide access to the compliance/firehose stream instead?
Adam Tornes commented
Thank you for this feedback. How are you thinking this functionality might differ from the existing statuses/lookup endpoint (beyond potentially giving a boolean/response vs. actual content)? Is this simply a request for higher rate limits to an endpoint like this? Would you prefer some kind of batch lookup?