San Francisco State University offers a graduate division data mining course, CSC 869, as part of the computer science program. It’s also a part of the Master of Science in Business Analytics Degree requirement.
Data mining in the digital age uncovers patterns for informed business decisions, strategy optimization and operational enhancements to revolutionize modern decision-making. When conducting data mining, one extracts insights from raw data.
Ben Dalziel, who graduated from SFSU in 2009 with a master’s degree in computer science, took CSC 869 in its infancy and is now the chief product officer at Slytrunk.
“Through the course, I learned and understood the basics of data mining and some of the tools and strategies that you can apply,” Dalziel said. “And I ended up using a lot of those techniques as part of my thesis, my graduate project.”
To Dalziel, data mining means turning information into insight or knowledge. It involves everything humans interact with on the internet or in the real world. He says that data mining is a valuable asset for anyone starting or running a business.
“Especially with a small or just starting a business, there is a lot of commoditized data mining functionality out there,” Dalziel said. “Amazon has a ton of it through AWS (Amazon Web Service), Microsoft has a good amount and Google has a good amount, and so those are resources where you don’t need to have a whole data science team working on developing and training a model and doing all that stuff.”
Data mining is necessary to gain insights into various areas ranging from minor changes in daily heart rate, as measured by wearable devices like Apple Watch or Fitbit, to more complex analysis of genetic data for biological and medical research.
“Regardless of the specific application, data mining is essential for gaining valuable insights,” Dalziel said.
Shaw Walters is the CEO of Upstreet Labs, a virtual world where users, researchers and developers work together to build next-gen data for training agents and robotics systems.
Walters believes data is crucial and can be gathered in two ways –– from users or by scraping the web, which has become a hot topic due to the rise of AI.
“I’m definitely embroiled in that space,” Walters said. “Between the two companies that I associated with, I kind of have both sides of that. On the Up Street [Labs] side, a big part of the way that we’re going to make AI models get better is by capturing user data and using it as training data to drive models. So we’re kind of upfront saying, ‘Hey, here’s a free-to-play game; we’re very transparent about what we do with the data. And it is to improve the experience of the game, make the game self-improve.’”
Walters stated that mining private data behind a password or violating an agreement is illegal and poses a significant liability. It’s best to avoid such data altogether.
According to Walters, social media companies like Meta (formerly the Facebook company), Instagram and Google collect the most data.
“They collect a ton of data collection –– everything data is their business because they advertise based on hyper-targeting,” he said. “So, every time you use [social media], they’re out there collecting, and that’s how most of it goes. For the rest of us, we scrape the web. And legally, you’re allowed to scrape the web because if anybody makes something public, you can go see it.”
Some websites use anti-scraping technology to protect their data. Open AI supports the robots.txt standard, which specifies that crawlers should not crawl specific data if the website provides a robots.txt file requesting so. Data crawling is a method that involves data mining from different web sources. Data crawling is like search engines collecting information from web links.
“AI companies are not supposed to steal your data. The biggest concerns right now, like hacking, is always going to be a thing,” Walters said. “The biggest concern is AI and what AI is doing to people’s data without them knowing or wanting that to be the case. That’s the big controversy in data mining right now.”
The growth of AI and new technological tools are making traditional data mining approaches outdated. Large language models automate the process, reducing the need for skilled engineers. AI agents and deep search will increasingly rely on these powerful tools.
Victor Uong is a second-year master’s student majoring in business analytics. Before coming to SFSU, Uong was a chemical engineer with no prior data mining experience.
“When I graduated, I worked as a chemical engineer and I found out that I didn’t like that at all,” Uong said. “And I didn’t know what topic that I’d like. But when I was working, there was one part of my work, my job, that I enjoyed doing. And that was working with data, like anything analytics.”
During the initial week of CSC 869, students were introduced to professional data utilization, covering the role of data scientists, analysis methods, and data accuracy assurance. They’ll explore diverse data categories, handling techniques, and strategies for non-normally distributed or imbalanced data in the coming weeks.