“You can have data without information, but you cannot have information without data.” – Daniel Keys Moran
By far, from my experience it is essential to build your argument/position having a supportive data at hand, we can’t just win the arguments ‘..I think..‘ rather ‘…here is the proof…’. If you cannot explain it simply, you don’t understand it well enough!
The data management is a key holder in any business, which differentiates today’s thriving organizations.
Data in all forms & sizes is being generated faster than ever before
At many organisations there is a process to setup the struggle between IT and Business have to run a huge portfolio of apps, the business always wants more apps but IT are struggling to just keep running what you have. In part this is also to cement that you are an expert and you understand their challenges. So the ideal aspect would be not to deal data management as another project, rather design the solution as an evolving process.
- What makes is so special that data science is becoming popular?
- How could you elevate data mining/machine learning skills for data science?
- Where will the statistical and operational research can help you to accomplish a stepping stone career in data science?
At current times the new titles within job industry is a buzz word, as the core job roles and responsibilities have been associated. By far whoever is dealing with data either collecting or analysing, they would be called a data analyst. You will need to draw a line (or a virtual wall) between data analysis and business intelligence developer. Based on Big Data University reference these are essential skills and tools for the data analysts need to have a baseline understanding of the following:
- Skills: statistics, data munging, data visualization, exploratory data analysis
- Tools: Excel, SPSS, SPSS Modeler, SAS, SAS Miner, SQL, Access, Tableau, SSAS.
What differentiates Data Engineers is that who prepares the ‘data’ infrastructure that will be analysed by the Data Scientists. Not to mention that software engineers are essential to design, build and integrate data from different resources. Having to write complex queries on the data, make it possible to access, analyse and process the data in optimising the business performance, this is what Big Data ecosystem is.
Over a period of time from the maturity of RDBMS platform both data warehouse and business intelligence have been evolved as a key route for organisation success and business growth. Making this as a baseline to the core, the IT skills must build several analytical disciplines that can help the organisation to grow within Data Platform.
So the mathematical skills are essential in this discipline that will create differences and denominators, by design. There a multiple categories that how best data science and data scientists domain is increasing, see here.
A key skill to develop in Analytics is to build knowledge in whole spectrum of business acumen and domain expertise. So there is no doubt that if you have mathematical skills build upon your academics will help the individual to step into data science world at a better place. The data science will sprawl across multiple disciplines and domains with a dominance. Based on my research and collection the following are highlights of where one can begin their data science journey:
- Computer Science – branched into multiple sectors of software, hardware, application and business arenas. The new concepts are data plumbing (in-memory analytics), machine learning programming, modeling (Python, R etc.) and RFID/Streaming data analytics.
- Statistician – a baseline to perform series of experiments by testing, cross validating, sampling and programming methods.
- Data Mining – there is an evidence that both data mining and machine learning overlaps between them. Either of these will land you in the core of data science
- Research – operational research, building optimisations and techniques will let you to encompass into data analytics & data science.
- Business Intellience/Data Warehous – from the matured RDBMS world that both of these aspects have better benchmarking in desgining, creating, generating KPIs, database schemas, dashboard design and visualisations based on the data-driven strategies to build/optimize/abstract better decisions & ROI.
- Machine Learning – there is need to sustain the new changes in the IT field with this discipline which is closely related to the data mining. This trade is very specific in building algorithms and design automated prototypes based on data-sets. A further dive into building core algorithms include clustering and supervised classification, rule systems, and scoring techniques is a hot-trade now and a flavour of AI (artificial intelligence) is bonus for you. This is where Pyton and R balances.
Few more references from the world wide web (mainly from Analytical Bridge website):
- Data mining: This discipline is about designing algorithms to extract insights from rather large and potentially unstructured data (text mining), sometimes called nugget discovery, for instance unearthing a massive Techniques include pattern recognition, feature selection, clustering, supervised classification and encompasses a few statistical techniques (though without the p-values or confidence intervals attached to most statistical methods being used). Data mining is applied computer engineering, rather than a mathematical science. Data miners use open source and software such as Rapid Miner.
- Predictive modeling: Not a discipline per se, this modeling projects occur in all industries across all disciplines. Predictive modeling applications aim at predicting future based on past data, usually but not always based on statistical modeling. Predictions often come with confidence intervals. Roots of predictive modeling are in statistical science.
- Statistics. Currently, statistics is mostly about surveys (typically performed with SPSS software), theoretical academic research, bank and insurance analytics (marketing mix optimization, cross-selling, fraud detection, usually with SAS and R), statistical programming, social sciences, global warming research (and space weather modeling), economic research, clinical trials (pharmaceutical industry), medical statistics, epidemiology, bio-statistics and government statistics.
Jobs requiring a security clearance are well paid and relatively secure, but the well paid jobs in the pharmaceutical industry (the golden goose for statisticians) are threatened by a number of factors – outsourcing, company mergings, and pressures to make healthcare affordable.
- Mathematical optimization. Solves business optimization problems with techniques such as the simplex algorithm, Fourier transforms (signal processing), differential equations, and software such as Matlab. These applied mathematicians are found in big companies such as IBM, research labs, NSA (cryptography) and in the finance industry (sometimes recruiting physics or engineer graduates). Mathematical optimization is however closer to operations research than statistics, the choice of hiring a mathematician rather than another practitioner (data scientist) is often dictated by historical reasons, especially for organizations such as NSA or IBM.
- Actuarial sciences. A key, just a subset of statistics focusing on insurance (car, health, etc.) using survival models: predicting when you will die, what your health expenditures will be based on your health status (smoker, gender, previous diseases) to determine your insurance premiums. They have seen their average salary increase nicely over time: access to profession is restricted and regulated just like for lawyers, for no other reasons than protectionism to boost salaries and reduce the number of qualified applicants to job openings. Actuarial sciences is indeed data science (a sub-domain).
- HPC. High performance computing, not a discipline per se, but should be of concern to data scientists, big data practitioners, computer scientists and mathematicians, as it can redefine the computing paradigms in these fields. HPC should not be confused with Hadoop and Map-Reduce. HPC is hardware-related, Hadoop is software-related (though heavily relying on Internet bandwidth and servers configuration and proximity).
- Six sigma. It’s more a way of thinking (a business philosophy, if not a cult) rather than a discipline, used for quality control and to optimize engineering processes. Applied, simple statistics are used (simple stuff works must of the time, I agree), and the idea is to eliminate sources of variances in business processes, to make them more predictable and improve quality.
- Artificial intelligence. It’s coming back. The intersection with data science is pattern recognition (image analysis) and the design of automated (some would say intelligent) systems to perform various tasks, in machine-to-machine communication mode, such as identifying the right keywords (and right bid) on Google AdWords (pay-per-click campaigns involving millions of keywords per day).
- Data engineering. New kid on the block, performed by software engineers (developers) or architects (designers) in large organizations (sometimes by data scientists in tiny companies), this is the applied part of computer science that allow all sorts of data to be easily processed in-memory or near-memory, and to flow nicely to (and between) end-users, including heavy data consumers such as data scientists.
- Business intelligence. Abbreviated as BI. Focuses on dashboard creation, metric selection, producing and scheduling data reports (statistical summaries) sent by email or delivered/presented to executives, competitive intelligence (analyzing third party data), as well as involvement in database schema design (working with data architects) to collect useful, actionable business data efficiently.
- Data analysis. This is the new term for business statistics since at least 1995, and it covers a large spectrum of applications including fraud detection, advertising mix modeling, attribution modeling, sales forecasts, cross-selling optimization (retails), user segmentation, churn analysis, computing long-time value of a customer and cost of acquisition, and so on.
- Business analytics. Same as data analysis, but restricted to business problems only. Tends to have a bit more of a financial, marketing or ROI flavor.
The first step is to discover yourself as an analyst ‘by nature’ or developer by inclination within the IT world. Sometimes the job title will mislead, so it is better to read the definition of the role and list out where you will excel. The four pillars to gain the excellence are: university degree, technical skills, business skills (new requirement) and professional certification.
Finally, networking is essential to know the latest-happenings in the world and see how a simple business is attempting to make big change in day-to-day life. If you are a ‘geek’ then participate in ‘hackathon’ type of events or as a developer you could contribute to the technical community as open source projects (search for github).