Predefined datasets
I'd like access to predefined datasets that provide ideas for my research and make it easy to get started.

3 comments
-
Ricardo Correia commented
I agree this would be a fantastic resource. This would also provide a standardised baseline dataset that could be used for example for machine learning or natural language processing applications. These datasets have been compiled in the past by researchers for specific purposes, but if Twitter were able to provide datasets covering specific topics or languages that are freely available, this would be a great resource for the community going forward.
-
Igor Brigadir commented
One example that comes to mind where this was done really well in the past is https://blog.twitter.com/engineering/en_us/a/2015/evaluating-language-identification-performance.html more of the same kind of thing on various topics beyond language identification would be great
-
Geoff Bacon commented
This is a really great idea, esheehan. One of the reasons I like this idea is that it would allow researchers to build out the rest of their project before having to pull data from the API. For example, I could use such predefined datasets to refine my preprocessing steps on realistic Twitter data before deciding what data I'd like to pull. Being able to play around with real data is always a great way to come up with new ideas on questions to ask.