I'm working on an AI platform that helps researchers and data scientists find the right datasets across multiple sources (Kaggle, government portals, APIs, academic databases, etc.) using natural language search. Right now, the process is super manual: lots of Googling, checking different sites, and dealing with inconsistent formats. I want it so that it can be easy to find super niche datasets for hyper specific problems.
Tl;dr – I think this could save researchers and ML/datascientists hours of time by aggregating datasets, summarizing them (columns, size, last updated), and even suggesting related datasets.
Longer explanation:
With this tool, you could type something like “I need data on smartphone usage and mental health for young adults” and it’ll find relevant datasets across platforms. It’ll also provide quick summaries so you know if it’s worth downloading without digging deep.
Would this be useful?
Trying to see if this is actually something people would use before I start building. Feedback is appreciated! 🙏
By - paullieber98
Comments